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A  method  of  interframe  motion  measurement  and  compensation,  based  on  approximatio 
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ABSTRACT 

A  method  of  interframe  motion  measurement  and  compensation,  based 
on  approximation  of  piecewise  linear  translation  of  small  rectangular 
areas,  is  presented.  This  method  significantly  improves  the  temporal  cor¬ 
relation  (10  to  12  dB  reduction  in  Interframe  variance)  and  permits  re¬ 
duction  of  sampling  rate  along  temporal  axis.  A  technique  of  visual 
characterization  of  interframe  characteristics,  based  on  temporal  cross- 
sections,  is  described. 

Some  commonly  used  statistical  models  for  prediction  of  the 
variances  of  transform  coefficients,  in  intraframe  and  Interframe  trans¬ 
form  coding,  are  compared.  Results  show  that  measured  statistics  in 
transform  domain  result  in  2  to  4  dB  improvement  over  commonly  used 
separable  covariance  models.  A  method  of  adaptation  for  the  local  changes 
in  image  statistics  in  transform  and  hybrid  coding  is  developed.  This 
results  in  great  improvement  (about  4  dB)  in  performance.  A  hybrid  coding 
scheme  using  motion  compensation,  frame  skipping,  and  interpolation  of 
skipped  frame  along  motion  trajectory  is  presented,  which  further  Improves 
the  coder  performance. 

Interframe  transform  and  hybrid  coding  schemes  are  compared  against 
Intraframe  transform  coding  and  some  simple  inter frame  predictive  coding 
schemes.  Also,  distortion-rate  curves  for  hybrid  coding,  based  on  models 
of  interframe  motion  have  been  plotted.  The  applications  of  transform  and 
hybrid  coding  to  biomedical  x-ray  Images  has  been  considered,  and  the 
results  show  that  significant  compression  can  be  realized  for  these 
images.  The  effects  of  distortion  due  to  data  compression  of  x-ray 
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projection  images  (used  in  computed  tomography)  on  the  reconstruction 
images  (by  inverse  Radon  transform)  have  been  evaluated. 

A  method  for  joint  optimization  of  source  coding  and  channel 
coding  for  PCM  transmission  over  noisy  channels  is  presented.  It  is 
shown  how  this  method  can  be  applied  to  transform  coding  of  images.  The 
results  show  that  this  method  performs  significantly  better  than  the  con¬ 
ventional  error  correcting  codes  or  schemes  with  no  channel  protection. 

_2 

At  a  rate  of  1  bit/plxel  and  channel  error  probability  of  10  ,  the 

proposed  method  results  in  10  dB  improvement  over  an  ordinary  transform 
coder. 

The  performance  of  several  transforms  has  been  compared  for  some 
commonly  used  intraframe  nonseparable  covariance  models.  The  results 
indicate  that  the  cosine  transform  performs  very  close  (.05  dB  at  1  bit/ 
pixel)  to  the  optimum  K-L  transform. 
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CHAPTER  I 

INTRODUCTION 

A  monochrome  image*  is  a  function  of  two  spatial  variables.  Many 
imaging  systems  generate  multiple  frames  of  images.  These  multiple 
frames  could  be  a  function  of  time  (e.g.,  in  television)  or  other  varia¬ 
bles  (e.g.,  angle  of  view  and  time  in  dynamic  spatial  reconstruciton, 

Robb  [63]).  Sampling  of  these  variables  depends  on  the  application,  e.g., 
in  broadcast  television  the  time  axis  is  sampled  at  50-60  samnles  per 
second  to  avoid  flicker. 

For  digital  processing,  an  image  frame  s  sampled  along  both  the 
spatial  axes.  Nyquist  sampling  theory  provides  the  most  important  step 
towards  a  reduction  of  digital  information  required  to  represent  a  con¬ 
tinuous  signal.  This  theory  states  that  any  bandllmited  signal,  sampled 
at  a  rate  greater  than  twice  its  highest  frequency  content,  could  be  re¬ 
produced  without  introducing  any  distortion.  In  the  simplest  binary- 
coding,  each  sample  of  an  image,  called  a  pixel,  is  quantized  independently 
by  a  finite  number  of  bits.  This  is  called  pulse  code  modulation  (PCM). 

For  raw  image  data,  each  pixel  is  uniformly  quanitzed,  and  is  represented 
by  a  fixed  number  of  bits.  For  human  viewing  of  an  image,  8  bltL^/pixel 
gives  sufficient  resolution.  For  broadcast  television,  the  data  rate  for 
PCM  transmission  of  images  is  approximately  65  Mbits/sec. 

1.1  Interframe  Data  Compression  Problem 

It  is  evident  that  the  enormous  data  rates  generated  by  multiple 
frame  images  would  results  in  high  costs  of  transmission  and/or  storage. 

*  as  opposed  to  a  color  image 
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Thus,  there  is  a  great  need  for  reducing  the  data  rates  as  much  as  possible. 
Interlacing  of  the  fields  in  television  broadcasting  is  a  form  of  data  com¬ 
pression  which  exploits  the  retention  properties  of  human  vision. 

A  simple  statistical  or  visual  analysis  of  the  image  data  reveals 
that  there  is  very  high  correlation  between  adjacent  pixels,  both  within 
a  single  frame  and  from  frame  to  frame.  This  high  correlation  results  in 
significant  redundant  information  in  the  original  raw  data.  The  basic 
problem  of  data  compression  is  to  effectively  exploit  this  redundancy  to 
reduce  the  data  rates. 

A  number  of  data  compression  (also  called  coding)  schemes  have  been 
developed  for  single  image  frames,  [5,22,29,31,36,38,51,53,58,75,77,83,87]. 
These  are  called  intraframe  coding  schemes  and  are  based  on  exploitation 
of  spatial  redundancy.  The  Interframe  coding  schemes  on  the  other  hand, 
utilize  the  redundancy  between  the  frames  as  well  as  within  the  frames 
and  generally  achieve  higher  compression  than  the  intraframe  schemes. 

In  principle,  it  is  possible  to  compress  the  digitized  data  without 
introducing  any  further  distortion  (digitization  itself  introduces  distor¬ 
tion).  However,  such  schemes  do  not  yield  large  enough  compression  ratios. 
It  is  possible  to  achieve  much  larger  compression  by  introducing  small  but 
acceptable  (depending  on  the  application)  distortion  in  the  originally 
digitized  data.  Thus,  we  need  some  quantitative  and  qualitative  measures 
of  the  distortion.  The  problem  of  data  compression  then  becomes  the  mini¬ 
mization  of  data  rates  for  a  given  distortion  level  or,  equivalently, 
minimization  of  the  distortion  for  a  given  data  rate. 

The  quantitative  distortion  measure  we  use  is  the  well  known  mean 

square  error  (MSE  or  m.s.e.)  criterion.  Let  u,  be  the  Intensity  of 

K  »  1  »  J 


a  sample  of  a  digitized  three  dimensional  image  data  array  and  uf  .  .be 
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its  reproduced  value  after  data  compression.  Then  the  MSE  due  to  data 


compression  is  defined  as 

K.i.j  - 
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where  N'^  is  the  total  number  of  samples  in  the  array  over  which  the  MSE 
is  being  measured.  Such  a  global  criterion  of  overall  mean  square  error 
is  not  always  very  meaningful,  especially  at  moderate  to  high  levels  of 
distortions.  So  this  has  to  be  used  with  some  qualitative  measures  to 
judge  the  quality  of  the  reproduced  images.  Some  qualitative  measures  are 
given  in  [12].  A  simple  method  is  to  judge  the  images  by  viewing  the 
encoded  image  and  comparing  it  with  the  original  image.  The  inspection 
of  the  error  images  (amplified  to  give  full  dynamic  range)  is  also 
very  informative  about  the  distribution  and  structure  of  the  errors. 
Sometimes  the  MSE  measured  over  locally  homogeneous  regions  of  an  image 
is  also  quite  useful.  More  sophisticated  criteria,  such  as  freqiency 
weighted  mean  square  error  [53],  or  visibility  of  errors,  etc.,  are 
possible,  but  are  difficult  to  incorporate  in  interfrarae  data  compression 
algorithms . 

The  MSE  is  also  expressed  by  a  quantity  called  the  signal  to  noise 
ratio  (SNR  or  S/N),  defined  in  decibels  (dB) ,  as 


SNR  =  10  log, 


(Peak  to  Peak  Signal)' 


1.2  Digital  Image  Transmission  System  and  Applications  of  Data  Compression 

Figure  1-1  shows  a  schematic  of  a  typical  digital  image  transmission 
(or  storage-retrieval)  system.  Block  1  consists  of  an  image  sensing  or 


Figure  1-1:  An  Overview  for  Digital  Transmission  or  Storage  Retrieval  System  for  Images 
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acquisition  system.  It  could  consist  of  a  contlnuou.s  sensor  (e.g.,  a 
raster  scanning  camera)  or  an  array  of  detectors  arranged  at  the  sampling 
grid.  The  acquired  image  signal  is  then  sampled  (if  continuous)  and  digi¬ 
tized  in  block  2.  Block  3,  which  is  of  most  interest  to  us,  contains  a  pro¬ 
cessor  which  performs  data  compression.  The  compressed  data  is  encoded 
into  bits  in  block  4*  Error  protection  for  transmission  over  noisy  channels 
is  also  done  here.  The  binary  coded  data  is  transmitted  or  stored 
(block  5).  Blocks  6  through  9  perform  the  inverse  of  m.ost  of  the  func¬ 
tions  performed  in  blocks  4  through  1  (not  all  the  functions  performed  in 
these  blocks  are  invertible,  e.g.,  quantization).  Tor  the  most  part 
(chapters  II  to  V)  we  will  be  concerned  with  blocks  3  and  7.  There,  we 
have  integrated  blocks  4-6  into  a  single  block  named  "channel".  In 
chapter  VI,  where  we  deal  with  image  coding  for  noisy  transmission  channels, 
we  will  consider  blocks  4-6  in  detail. 

There  are  several  considerations  in  developing  the  data  compression 
algorithms  of  block  3  in  addition  to  reducing  the  data  rate.  These  con¬ 
siderations  Include  the  complexity  of  binary  encoder  and  decoder,  real 
tine  processing  in  blocks  3  and/or  7,  uniform  or  variable  data  rates, 
amount  of  storage,  size  of  channel  buffer,  noise  characteristics  of  the 
channel,  etc. 

There  are  many  applications  where  Interframe  data  compres  -  ion  of 
images  could  be  used  with  great  savings  in  transmisslon/storagf  losts. 

These  Include  television  transmission  between  stations,  teleconferencing, 
videotelephone,  satellite  images,  biomedical  x-ray  images  in  computer 
aided  tomography  and  angiocardiography,  etc.  For  future  projections  for 
satellite  communication  traffic  for  television  and  teleconferencing, 
see  [73]. 
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1.3  Background  and  Review  of  Current  Multiframe  Coding  Techniques 

The  multiframe  data  compression  schemes  reported  in  the  literature 
have  been  applied  mostly  to  video  images,  generated  by  a  television  or 
a  movie  camera,  where  the  interframe  variable  is  time.  In  some  applica¬ 
tions,  e.g.,  television  and  videophone,  a  time  evolving  scene  is  regis¬ 
tered  as  a  sequence  of  equi- interval  Images  by  a  camera,  which  is  mostly 
stationary.  In  other  applications,  such  as  remotely  piloted  vehicle  (RPV) , 
a  moving  camera  is  capturing  a  time  changing  scene.  In  the  former  appli¬ 
cations,  the  changes  from  one  frame  to  the  next,  with  respect  to  a  fixed 
location  in  the  frame,  are  mostly  localized  in  some  areas  of  the  frame. 

While  in  the  latter  case,  such  changes  occur  throughout  the  frame.  These 
changes,  including  those  due  to  zooming  the  camera,  are  called  inter¬ 
frame  motion. 

A  simple  method  of  detecting  motion  between  two  consecutive  image 

frames  is  by  measuring  the  temporal  changes  between  them.  Suppose  u,  .  . 

K.  >  J-  >  J 

represents  the  intensity  of  the  (i,j)th  pixel  of  the  kth  frame.  If  the 
Interframe  difference  (IFD)  signal, 

'Scjljj  \,i,j  '\-l,l,j  ’ 

exceeds  a  certain  threshold,  then  the  (l,j)th  pixel  of  the  kth  frame  is 
classified  as  moving.  An  inherent  assumption  here  is  that  the  illumination 
from  one  frame  to  the  next  remains  unchanged.  Generally,  the  moving  pixels 
occur  in  clusters  [13],  and  constitute  the  so  called  moving  areas.  The 
rest  of  the  image  constitutes  the  stationary  areas.  Techniques  which  are 
adjusted  according  to  the  local  changes  in  spatial  and  temporal  charac¬ 
teristics  of  Images  are  called  adaptive  methods. 
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Interframe  coding  techniuqes  could  be  broadly  classified  into  three 
categories.  Tho  first  category,  a  subject  most  investigated,  is  called 
predictive  coding.  Hero  tiio  intensity  value  of  a  current  pixel  in  a 
ra.ster  scanned  image  is  predict.id  from  th"  Knowledge  of  che  previously 
scanned  and  coded  pixels.  Generally,  the  pixel  neighbor.hood  used  in  pre¬ 
diction  is  limited  to  be  a  small  set  of  pixels  in  the  present  and  the 
preceding  image  frames.  This  is  because  of  the  Markovian  nature  of  the 
data  and  it  limits  the  memory  requirements  to  slightly  mere  than  one 
Image  frame.  The  prediction  error,  which  represents  the  nev  information 
in  the  current  pixel,  is  quantized  and  coded.  For  highly  correlated  data, 
the  prediction  error  is  generally  small  and  can  be  coded  by  much  fewer 
bits  than  required  in  PCM  transmission. 

Tne  second  category,  developed  more  recently  for  multiframe  images, 
is  called  transform  coding.  While  in  most  predictive  coding  schemes  we 
end  up  with  as  many  samples  as  che  input  data,  transform  coding  packs  the 
information  in  much  fewer  samples  which  need  to  be  coded.  Typically, 
the  interframe  data  is  divided  into  smaller  three  dimensional  arrays, 
called  sub-blocks,  of  equal  size.  Each  sub-block  is  then  operated  upon 
by  a  three  dimensional,  separable,  unitary  transform  and  the  selected 
transformed  samples  are  quantized  and  coded  Independently.  The  sub-blocks 
are  reconstructed  by  taking  the  inverse  transform.  This  method  ’equires 
storage  equal  to  the  number  of  frames  in  the  temporal  direction  i  f  the 
sub-blocks. 

The  third  category  Is  a  combination  of  the  above  two  and  is  called 
hybrid  or  transform/predictive  coding.  Here,  each  image  frame  is  divided 
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into  equal  size  sub-blocks  and  each  sub-block  is  transformed  by  a  two 
dimensional  separable  unitary  transform.  Predictive  coding  is  then  per¬ 
formed  along  the  temporal  axis,  for  each  transformed  sample,  to  exploit 
frame  to  frame  correlation. 

Interframe  coding  techniques  have  gained  momentum  only  since  the 
mid-sixties.  A  brief  review  of  the  recent  work  is  given  below  for  each 
of  the  three  categories. 

1.3.1  Predictive  Coding  Techniques  -  Due  to  their  simple  hardware 
realization,  considerable  work  has  been  done  on  predictive  coding  schemes 
In  [52],  prediction  is  based  on  the  previous  pixel  of  the  same  scan  line 
and  the  technique  is  essentially  an  intraframe  one  dimensional  DPCM. 

Recognition  of  the  fact  that  a  vast  majority  of  pixels  in  a  given 
frame  do  not  differ  noticeably  from  the.  corresponding  pixels  of  their 
preceding  frame  (i.e.,  most  of  the  image  field  in  successive  frames  is 
stationary)  has  led  to  interframe  predictive  schemes  which  do  not  require 
transmission  of  stationary  pixels.  In  [48],  the  prediction  of  a  pixel 
is  simply  the  intensity  value  of  the  corresponding  pixel  in  the  preceding 
frame.  If  the  absolute  value  of  the  prediction  error  is  larger  than  a 
threshold,  it  is  quantized  and  coded  together  with  the  address  of  the 
pixel.  Otherwise,  the  value  of  the  pixel  in  the  preceding  frame  is  re¬ 
peated.  This  technique  Is  called  conditional  replenishment  because  only 
the  moving  areas  of  the  image  are  replenished  from  one  frame  to  the  next . 
It  is  evident  that  the  rate  at  which  the  code  is  generated  varies  depend¬ 
ing  on  the  size  of  and  activity  in  the  moving  areas.  If  the  transmission 
channel  is  designed  for  an  average  data  rate,  then  an  arbitrarily  large 
buffer  would  be  required  to  take  care  of  large  fluctuations  in  the  level 
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of  motion.  To  limit  the  buffer  requirement  to  a  reasonable  size,  a 
variable  threshold  is  used.  The  threshold  is  increased  as  the  buffer 
is  filled  up,  therebv  reducing  the  rate  at  which  the  data  is  eenerated. 

It  has  been  noted  [57]  that  the  spatial  resolution  in  the 
moving  areas  and  the  temporal  resolution  in  the  stationary  areas  of  an 
image  can  be  reduced  without  noticeable  reduction  in  the  quality  of  per¬ 
ception  of  the  scene.  This  is  called  the  exchange  of  spatial  and  temporal 
resolution.  In  [44]  a  simple  coder  has  been  described  which  exploits 
tnis  exchange  of  resolution  to  reduce  the  data  rate. 

In  [13],  the  conditional  replenishment  method  of  [48]  has  been 
improved,  and  some  of  the  techniques  of  [44], together  with  some  other 
adaptations,  have  been  used  to  result  in  a  more  efficient  coder  with 
increased  complexity.  We  have  simulated  this  technique  for  comparison 
purposes  and  a  brief  description  is  given  in  chapter  III.  A  review  of 
the  above  techniques  and  some  other  simpler  techniques  is  given  in  [24]. 

If  some  area  of  a  frame  is  moving  at  a  speed  larger  than  1  pixel/ 
frame,  then  it  is  obvious  that  for  a  pixel  belonging  to  such  an  area  the 
correlation  with  intraframe  neighbors  would  be  higher  than  that  with  the 
corresponding  pixel  in  the  preceding  frame.  Hence,  in  the  absence  of  the 
knowledge  about  the  direction  of  the  motion,  an  intraframe  prediction 
error  would  have  a  lower  variance  than  the  interframe  difference  .ignal. 
Thus,  more  compression  can  be  achieved  by  coding  the  intraframe  prediction 
error  in  such  moving  areas.  A  scheme  utilizing  this  is  reported  in  [42]. 
To  detect  the  moving  areas  an  adaptation,  better  than  that  of  [48]  and 
[ 13] ,  Is  used. 


The  correlation,  power  spectrum,  and  some  other  properties  of 
frame-difference  signal  are  reported  in  [17],  Mathematical  analysis  as 
well  as  experimental  results  are  given.  The  results  of  this  study  have 
been  used  in  [42]  to  design  a  better  segmenter  of  moving  and  stationary 
areas  for  the  purpose  of  coding.  The  simulation  results  for  the  entropy 
of  prediction  error  signal  for  a  variety  of  predictors  (using  different 
combinations  of  the  neighborhood  pixels  in  the  present  and  the  preceding 
frame)  at  various  speeds  are  reported  in  [23].  Pictures  with  different 
resolutions  have  been  used  to  compare  entropy  versus  resolution  at 
various  speeds. 

Most  of  the  coders  described  in  [13,24,42,44]  are  designed  for  a 
data  rate  of  0.25  -  1.0  bits/pixel  and  for  a  signal  with  1  MHz  bandwidth, 
the  data  rate  is  .5-2  Mbits/sec.  In  [26]  a  very  low  bit  rate  coder, 

.1  bit/pixel, or  0.2  Mbits/sec.,  has  been  described  which  reproduces  the 
stationary  areas  quite  well,  but  scenes  containing  moderate  and  large 
motions  are  visibly  smeared  and  blurred.  It  combines  cluster  coding  of 
[13] ,  a  higher  order  prediction  given  by  the  line-to-line  difference  of 
the  frame  difference  signal,  subsarapling  in  spatial  and  temporal  direc¬ 
tions  as  needed,  temporal  filtering,  etc.,  to  achieve  this  low  data 
rate.  Low  pass  temporal  filtering  of  the  signal  is  done  to  reduce  the 
entropy  of  the  prediction  error.  A  near-ideal  low  pass  filter  would  re¬ 
quire  several  frame  memories.  To  limit  the  memory  requirement  to  one  frame, 
a  simple  temporal  filtering  could  be  performed  by  sub-sampling  in  temporal 
direction  and  then  interpolating  the  missing  frames.  In  [26]  temporal 
filtering  is  performed  by  a  simple  averaging  of  the  incoming  frame  and  the 
previously  stored  frame.  Because  of  the  temporal  filtering  by  averaging,  the 
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jerkiness  in  motion  due  to  temporal  subsampling  is  less  objectionable. 

Most  of  the  experiments  reported  in  the  aforementioned  literature 
were  carried  out  on  data  sampled  at  about  2  x  10^  samples/sec.  The 
coder  of  [13]  at  1  bit/plxel  thus  has  a  data  rate  of  2  Mbits/sec.  For 
many  applications,  a  higher  resolution  with  a  sampling  rate  of  8  x  lO^ 
samples/sec.  Is  desired.  At  1  bit/pixel  it  would  require  a  high  data 
rate  of  8  Mbits/sec.  A  coder  is  described  in  [25],  which  compresses  the 
data  rate  to  1.5  Mblts/sec.  or  .19  bit/pixel.  It  is  reported  to  give 
acceptable  quality  with  some  blurring  of  the  moving  areas  in  TV-conference 
type  of  applications,  where  for  the  most  part  the  camera  is  stationary 
and  the  moving  subjects  do  not  move  too  rapidly.  This  coder  utilizes 
conditional  replenishment  of  [13],  moving  area  segmenter  of  [42],  a 
higher  order  predictor,  and  a  temporal  filter,  as  in  [26].  Variable 
quantizer  levels  and  sub-sam^llng  in  spatial  domain  are  used  to  maintain 
a  smooth  data  rate. 

A  coder  which  uses  the  interframe  sample  difference,  temporal 
filtering  by  attenuating  the  frame  difference  signal,  spatial  subsampling 
in  both  directions  when  buffer  starts  filling  up,  variable  length  code 
words,  etc.,  is  reported  in  [84].  It  is  designed  for  4  MHz  videotelephone 
and  NTSC  color  TV  signals,  and  operates  at  an  average  rate  of  6.312  Mbits/ 
sec.  An  Interframe  coder  for  NTSC  color  TV  signals  has  also  been  built 
by  Nippon  Electric  in  Japan  and  has  been  reported  in  [30].  This  is  de¬ 
signed  for  high  quality  transmission  and  operates  at  16-32  Mbits/sec. 

A  higher  order  prediction  coder  which  differs  considerably  from 
those  discussed  above  is  described  in  [10].  It  assumes  that  the  interframe 
data  is  a  sample  of  a  3-D  wide  sense  stationary  random  process  whose 
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covariance  is  separable  and  first  order  Markov  in  each  dimension.  Under 
this  assumption,  the  optimal  predictor  is  based  on  seven  pixels  with 
prediction  coefficients  directly  related  to  the  one  step  correlation  in 
each  direction.  Figure  1-2  shows  the  pixels  and  their  prediction  coeffi¬ 
cients  used  in  the  prediction  of  the  point  marked  S.  The  prediction  error  is 
quantized  and  coded  using  variable  word  length  codes.  The  one  step  tem¬ 
poral  coefficient,  a^,  has  been  set  equal  to  1  to  give  better  prediction 
and  low  data  rate  in  the  stationary  areas.  This  scheme  also  generates 
data  at  a  nonuniform  rate  and  to  keep  buffer  requirement  reasonably  low, 
some  adaptations  have  been  made  to  reduce  data  rate  when  the  buffer  starts 
filling  up.  Depending  on  the  buffer  contents,  a  temporal-spatial  filtering 
is  performed  in  which  a  weighted  average  of  the  interframe  difference  sig¬ 
nals  of  the  neighboring  pixels  is  taken.  The  weights  are  controlled  by 
buffer  contents.  An  additional  temporal  filtering  is  used  when  buffer 
overflow  is  imminent.  This  is  achieved  by  attenuating  the  output  of  the 
temporal-spatial  filter  used  to  reduce  the  entropy  of  the  signal. 

In  all  of  the  above  techniques,  the  motion  in  any  area  of  the  scene 
is  Inferred  from  the  magnitude  of  the  interframe  difference  signal.  No 
efforts  are  made  to  measure  the  nature  and  the  direction  of  the  motion. 

Due  to  comnutational  and  dimensionality  problems,  most  of  the  motion 
analysis  of  interframe  images  has  been  restricted  to  translational  mo¬ 
tion,  see  e.g.  [9,11,66].  In  [66]  a  mathematical  analysis  is  presented 
where  an  image  is  divided  into  smaller  areas  or  zones.  For  each  zone 
the  displacement  vector  (x  and  y  coordinates  of  the  motion)  and  the 
corresponding  prediction  errors  (the  prediction  is  based  on  the  pixel 
in  the  previous  frame  corresponding  to  the  displacement  vector  of  that 
zone)  are  transmitted.  For  the  purpose  of  analysis,  a  mathematical  model 


-  One  step  sample- to-sample  correlation 
O'.  -  One  step  line-to-line  correlation  in  the  same  field 

Xj 

NOTE:  Each  frame  consists  of  two  interlaced  fields 


Figure  1-2:  Coefficients  of  the  Seven  Point  Predictor  Described  in  flOJ 
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for  the  random  video  process  is  constructed  to  determine  the  optimum 
size  of  the  zone  which  can  be  represented  by  a  single  displacement  vector. 

Methods  for  measuring  small  displacements ,  and  segmenting  an  image 
into  stationary  and  moving  areas  with  different  displacement,  has  been 
considered  in  [11].  Based  on  linear  regression  and  approximation,  simple 
formulas  are  derived  where  the  motion  could  be  measured  from  interframe 
difference  signal  and  first  order  spatial  differences  in  x  and  y  directions. 
To  segment  an  image  into  moving  and  stationary  areas,  it  is  assumed  that 
there  is  only  one  moving  object  undergoing  translation.  A  two  state 
Markov  model  with  known  state  transition  probabilities  is  assumed.  A  maxi¬ 
mum  a  posteriori  (MAP)  detector  of  the  Markov  chain  is  found  using  the 
Viterbl  algorithm  by  observing  Interframe  differences  and  assuming  them 
to  be  an  independent  sequence.  Then  the  method  is  extended  to  more  than 
one  moving  object.  Displacement  measurement  accuracy  of  .1  pixel/frame 
for  motion  up  to  2-3  plxles/frame  has  been  reported. 

An  interframe  coder  using  image  segmentation  and  motion  measurement 
is  described  in  [9],  together  with  some  experimental  results.  Each  frame 
is  segmented  into  three  areas,  namely,  stationary  background,  translating 
objects,  and  areas  which  cannot  be  predicted  from  the  previous  frame^via 
a  tri-state  MAP  estimator.  For  stationary  background  and  translating 
objects,  prediction  is  based  on  the  corresponding  pixels  from  the  previous 
frame.  While  for  the  remaining  areas  a  spatial  predictor  is  used. 

A  somewhat  different  approach  to  motion  estimation  and  its  appli¬ 
cation  to  interfrarae  coding  has  been  recently  published  in  [50].  Here 
a  pixel  by  pixel  translational  motion  is  recursively  estimated  and  the 
Interframe  prediction  is  based  on  the  estimated  motion-dlsplaced-locatlon 
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in  the  previous  field.  The  prediction  error  is  cluster  coded  similar  to 
[13]. 


1.3.2  Transform  Coding  Techniques  -  The  superior  performance  of 
transform  coding  over  other  techniques  for  coding  intrafrarae  images  is 
well  known.  Its  extension  to  interframe  coding  using  3-D  transforms  was 
not  attempted  until  recently,  mainly  because  of  the  requirement  of  storing 
several  frames  at  the  transmitter  as  well  as  at  the  receiver,  resulting 
in  exorbitant  memory  costs.  Recent  developments  in  digital  technology  now 
make  it  possible  to  store  several  image  frames  and  thus  make  transform 
coding  feasible.  Knauer  [39]  has  reported  some  results  on  Hadamard  trans¬ 
form  coding.  He  considers  a  block  of  4  image  frames  at  a  time.  Each 
frame  consists  of  two  interlaced  fields  and  contains  525  x  512  pixels, 
each  pixel  originally  quantized  to  6  bits.  This  block  of  4  frames  is 
divided  into  sub-blocks  of  size  4  x  4  x  4,  Each  sub-block  is  transformed 
by  a  3-D  Hadamard  transform  (for  definitions  of  various  transforms  used 
in  data  compression,  see  [3,31,58])  and  the  transform  coefficients  are 
truncated  to  8  bits.  To  design  the  coder  at  a  given  bit-rate,  a  fixed 
number  of  bits  are  distributed  among  various  transform  coefficients,  a 
majority  of  which  are  assigned  no  bits.  The  bit  assignment  has  been 
found  by  trial  and  error  to  give  good  visual  quality.  The  coder  can  adapt 
to  motion  by  keeping  high  spatial  resolution  for  stationary  areas  and 
exchanging  it  for  temporal  resoltuion  in  moving  areas. 

The  transform  coder  of  [39]  lacks  the  mathematical  analysis  in  bit 
assignment,  which  is  an  Important  aspect  of  transform  coding.  Roese  [67], 
Roese,  et  al.  [68]  and  Nataraj an  and  Ahmed  [49]  have  extended  the  mathe¬ 
matical  analysis  of  2-D  transform  coding  to  three  dimensions  and  have 
also  reported  experimental  results  on  inter frame  coding. 
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In  [67,68]  the  Interframe  Image  random  field  is  modeled  as  a 
3-D,  wide  sense  stationary  first  order  Markov  field  with  separable  co- 
variance  function  in  each  dimension.  The  interframe  image  data  is 
divided  into  smaller  3-D  sub-blocks  and  then  transform  coded  independently. 
The  bit  assignment  is  based  on  the  separable  covariance  model  and  Shannon 
rate  distortion  bound  for  the  quantizer.  The  transform  samples  are  quan¬ 
tized  using  a  compander  which  performs  very  close  to  the  optimum  Max 
quantizer.  The  distribution  for  each  transform  sample  is  assumed  to  be 
Laplacian,  except  for  the  DC  term,  for  which  a  Rayleigh  distribution  is 
assumed.  These  distributions  are  reported  in  [67]  to  be  good  approxima¬ 
tions  for  image  data.  The  variances  of  the  transform  samples  are  found 
from  the  covariance  model  chosen.  Theoretical  performance  of  the  coder 
using  Cosine  transform  for  various  block  sizes  has  also  been  reported. 

The  mean  square  error  decreases  with  the  increase  in  block  size,  but  it 
also  increases  the  complexity.  The  experimental  results  on  the  actual 
data  are  also  reported  for  the  Cosine  transform  at  various  bit-rates  for 
a  block  size  of  16  x  16  x  16. 

The  fact  that  the  multiframe  data  cannot  be  satisfactorily  modeled 
by  separable  statistics  (covariance),  a  model  such  as  described  in  [67] 
could  yield  poor  coding  performance.  But  we  do  need  the  transform  domain 
variances  to  design  a  coder  without  resorting  to  trial  and  error.  In 
[49]  the  authors  suggest  calculating  the  3-D  covariance  function  on  a 
portion  of  the  Image  data  over  a  window  of  the  same  size  as  the  sub-block 
and  assuming  the  random  process  to  be  wide  sense  stationary.  From  this, 
the  transform  domain  variances  could  be  calculated  by  appropriately  taking 
the  transform.  The  three  dimensional  sub-blocks  are  stored  as  one  dimen¬ 
sional  arrays  by  lexicographic  ordering  to  facilitate  the  addressing.  A 
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Kronecker  product  of  the  transform  matrices  is  then  used  to  find  the 
equivalent  3-D  transform  of  this  array.  The  experimental  results  for  a 
block  size  of  4  ^  4  x  A  for  Cosine  and  Hadamard  transforms  are  reported 
at  1  bit/pixel  for  4  MHz  signals. 

1.3.3  Hybrid  Coding  Techniques  -  Because  of  difficulties  in 
modeling  interframe  image  fields  as  well  as  the  increased  complexity  of 
the  transform  coders,  transforra/predlctive  or  hybrid  coding  techniques 
have  also  been  investigated  in  [67,68].  These  are  extensions  of  the 
intraframe  hybrid  coding  described  in  [22].  Each  frame  is  divided  into 
smaller  equal  size  sub-blocks  and  each  sub-block  is  transformed  by  a 
unitary  transform.  Then  a  linear  first  order  predictor  is  used  in  the 
temporal  direction.  In  a  simple  or  non-adaptive  scheme  a  separable  first 
order  Markov  model  in  each  dimension  is  used.  Based  on  this  model  the 
calculation  of  transform  domain  variances  and  the  optimum  prediction 
coefficient  are  easily  found.  Theoretical  performance  of  this  coder  for 
various  sub-block  sizes  is  reported.  Another  scheme, in  which  the  local 
changes  in  the  statistics  are  taken  into  acount  by  measuring  the  statis¬ 
tics,  at  the  transmitter  as  well  as  the  receiver  (and  using  these  statistics 
for  coding)  has  been  reported  with  many  experimental  results.  This  scheme 
has  been  called  adaptive  hybrid  coding.  Results  for  discrete  Cosine  and 
Fourier  transforms  at  various  bit-rates  have  been  reported  together  with 
the  effect  of  channel  errors.  The  adaptive  hybrid  scheme  shows  a  much 
better  performance  compared  to  the  non-adaptive  hybrid  coding  and  transform 
coding  schemes  based  on  a  3-D  separable  model.  Also,  some  methods  and 
experimental  results  for  motion  compensation  of  the  camera  motion 
have  been  reported  in  [67]. 
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1.4  Research  Objectives: 

In  the  broad  context  of  interframe  coding  with  emphasis  on  hybrid 
and  transform  techniques,  the  following  problems  have  been  identified 
and  solutions  proposed  together  with  experimental  results. 

(i)  The  transform  and  hybrid  coders  reported  so  far  allocate 
equal  bits  to  all  the  areas  of  an  image.  It  is  intuitively  obvious  that 
the  stationary  areas  with  no  Interframe  activity  could  be  transmitted  with 
little  or  no  bits,  while  those  with  more  activity  would  require  more 

bits  to  transmit  the  changes.  Our  objective  is  to  find  ways  of  classi¬ 
fying  sub-blocks  of  Images  into  classes  of  varying  temporal  and  spatial 
activity  and  assignment  of  bits  for  various  classes. 

(ii)  Although,  in  general,  the  interframe  motion  is  difficult  to 
characterize,  in  most  cases  it  could  be  approximated  by  linear  transla¬ 
tion.  Since  hybrid  coding  is  based  on  a  block  by  block  coding,  we  inves¬ 
tigate  the  methods  of  measuring  translation  on  a  block  by  block  basis  and 
use  it  for  data  compression.  A  technique  for  frame  interpolation  along 
motion  trajectory  will  be  investigated  to  achieve  higher  data  compression. 

(iii)  Since  the  trajectory  of  motion  of  a  pixel  (or  a  block) 
cannot  be  estimated  perfectly,  we  consider  models  and  effects  of 
uncertainty  in  trajectory  estimation  for  data  compression.  We  would  also 
like  to  find  rate-distortion  curves  based  on  such  models. 

(iv)  We  investigate  the  problem  of  joint  optimization  of  data 
compression  and  channel  encoding  for  minimizing  the  overall  mean  square 
error  for  image  transmission  over  noisy  channels. 
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(v)  As  pointed  out  earlier,  the  interframe  data  compression 
schemes  have  been  applied  only  to  video  images.  We  look  at  another 
potential  area  of  application,  data  compression  of  biomedical  x-ray 
images . 

(vl)  The  performance  of  the  hybrid  and  transform  coding  methods 
is  dependent  on  the  choice  of  the  transform.  We  evaluate  the  relative 
performance  of  various  transforms  for  a  variety  of  non-separable  two 
dimensional  random  fields  which  have  been  used  for  modeling  image  co- 
variance  statistics.  Previous  results  have  considered  only  the  separable 
covariance  model. 


Description  of  Experimental  Daca  Sets: 


We  have  used  four  very  distinct  types  of  multiframe  image  data 
sets.  Two  of  these  data  sets  are  video  motion  images  obtained  from  the 
Naval  Ocean  Systems  Center,  San  Diego,  California.  The  other  two  data 
sets  are  x-ray  images  obtained  from  the  Biodynamic  Research  Unit,  Mayo 
Foundation,  Rochester,  Minnesota.  A  brief  description  of  these  data  is 
provided  below. 


(1)  Head  and  Shoulders  (H  &  S)  -  contains  16  sequential  frames 
of  16  mm,  24  frames /second,  motion  picture  of  a  subject  (Walter  Kronkite) 
against  a  stationary  background  in  conversation,  digitized  to  256  x  256 
pixels/frame,  8  bits/pixel. 

(li)  Chemical  Plant  -  an  aerial  view  of  a  complex  of  buildings 
and  roads  from  a  moving  platform,  16  frames  digitzed  to  256  x  256  pixels/ 
frame,  8  bits/pixel.  It  contains  a  fourth  of  a  35  mm  frame  digitized  to 
512  X  512  pixels/frame. 
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(iii)  Angiocardiograms  -  100  X-ray  images  of  the  left  ventricle 
of  a  human  heart  taken  at  intervals  of  1/30  sec.,  after  injection  of  a 
contrast  material  in  the  blood,  contain  four  complete  heart  beat  cycles. 
Each  frame  digitized  to  128  x  176  pixels  with  8  bits/pixel.  Spatial 
resolution  is  .5  mm. 

(iv)  Projection  Images  -  120  X-ray  projections  of  a  dead  dog's 
thorax  taken  at  intervals  of  3°  around  an  axis  approximately  through  the 
center  of  the  thorax.  Each  image  digitized  to  128  x  128  pixels  with 

8  bits/pixel.  Spatial  resolution  is  ~  1  mm.  These  images  are  used  for 
3-D  reconstruction  of  the  X-ray  absorption  densities  of  the  thorax. 

Since  each  data  set  is  digitized  to  8  bits/pixel,  the  intensity 
range  of  the  original  data  is  between  0  and  255.  Therefore,  for  the  calcu¬ 
lations  of  SNR, the  peak-to-peak  value  of  the  signal  has  been  taken  as 
255,  even  though  the  actual  peak-to-peak  signal  could  be  somewhat  smaller. 

1.6  Dissertation  Organization 

We  first  start  with  the  problem  of  modeling  and  understanding  of 
the  temporal  characteristics  of  the  motion  images  in  Chapter  II.  There 
we  propose  some  methods  of  translational  motion  measurement  on  a  block 
by  block  basis.  Then  we  analyze  the  effects  of  uncertainty  in  motion 
estimation  and  define  some  parameters  which  give  simple  measures  of  this 
uncertainty  and  are  useful  for  developing  data  compresion  algorithms.  We 
also  propose  a  method  of  data  compression  based  on  temporal  subsampling 
and  interpolation  of  the  missing  frames  along  the  motion  trajectory. 

In  Chapter  III  we  describe  two  interframe  predictive  schemes.  One 
of  these  schemes  is  reported  In  [13],  and  the  other  one  is  a  simple 


Frame  No.  12 


Frame  No ,  16 


Some  Frames  of  the  Original  Head  and  Shoulders  (on  the  Left)  and  Chemical 
Plant  (on  the  Right)  Data  Sets. 

Figure  1-3 
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adaptive  scheme  based  on  the  classification  of  motion.  Results  for  both 
these  schemes  have  been  presented  primarily  for  the  purpose  of  comparison 
with  the  schemes  of  Chapter  IV  and  V. 

Chapter  IV  starts  with  the  basic  concepts  of  transform  coding. 

Then  the  results  of  intraframe  2-D  transform  coding  are  presented.  They 
form  a  good  basis  for  comparing  the  interframe  3-D  transform  as  well  as 
hybrid  coding  against  intrafrarae  coding  and  provide  a  measure  of 
compression  gain  due  to  interframe  redundancy.  Finally,  results  of  some 
non-adaptive  and  adaptive  interframe  transform  coding  are  presented. 

Chapter  V  presents  non-adaptive  and  many  adaptive  hybrid  coding 
schemes  and  forms  the  major  portion  of  the  interframe  coding  schemes  of 
this  thesis.  Adaptive  schemes  include  classification  based  on  activity, 
motion  measurement  and  compensation,  and  temporal  subsampling  with  inter¬ 
polation  along  motion  trajectory. 

In  chapter  VI  we  present  a  new  method  of  data  compression  for 
transmission  over  noisy  channels.  This  consists  of  joint  optimization 
of  source  and  channel  coding  to  reduce  the  overall  MSE  distortion  in  the 
signal  due  to  quantization  and  channel  noise.  Rate  distortion  curves  for 
coding  of  random  variables,  and  one  and  two  dimensional  random  processes 
are  given  together  with  experiments  on  actual  image  data. 

Summary  and  conclusions  of  our  investigation  as  well  as  future 
areas  of  research  are  reported  In  chapter  VII.  Appendix  A  discusses  the 
problem  of  modeling  2-D  image  statistics  together  with  some  results. 
Appendix  B  gives  the  results  of  comparisons  of  various  unitary  discrete 
transforms  used  in  data  compression  for  some  2-D  random  fields. 
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CHAPTER  II 

MODELING,  MEASUREMENT,  AND  ANALYSIS  OF  TEMPORAL  CHARACTERISTICS 

The  temporal  characteristics  of  a  sequence  of  images  differ  con¬ 
siderably  for  various  applications.  The  changes  between  two  consecutive 
frames  basically  have  two  components,  deterministic,  and  random.  If  a 
pixel  or  a  group  of  pixels  in  a  current  frame  has  a  correspondence  with 
a  pixel  or  a  group  of  pixels  in  the  preceding  frame  which  can  be  charac¬ 
terized  by  a  deterministic  function,  then  that  function  is  the  determin¬ 
istic  component  and  the  residual  value  of  the  pixel  in  the  current  frame 
after  subtracting  the  deterministic  component  will  be  called  the  random 
component. 

In  motion  Images,  some  common  types  of  deterministic  components 
are,  linear  translation  or  dotation  of  objects  against  a  fixed  background 
in  a  scene,  zooming  and  panning  of  the  camera,  linear  and  rotational  motion 
of  the  camera,  etc.  In  practice,  the  interframe  motion  is  a  combination 
of  the  above  and  various  other  motions  which  are  not  easy  to  characterize. 

2.1  Motion  Characteristics  from  Temporal  Cross-Sections 

In  a  laboratory  the  interframe  motion  can  be  perceived  by  viewing 
the  Images  as  a  movie.  We  have  considered  an  alternative  way  of  present¬ 
ing  the  data  so  that  the  motion  can  be  inferred  by  looking  at  the  images 
in  a  stationary  mode.  Since  many  image  processing  facilities  (including 
ours)  do  not  have  the  capability  to  display  interframe  digital  data  in 
real  time,  this  method  is  useful  in  visual  representation  of  interframe 
motion  by  stationary  images. 
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We  select  a  line  in  any  direction,  say  0,  in  the  image  plane 
passing  through  the  region  of  interest  and  store  the  pixels  along  that 
line  within  the  region  of  interest  as  a  horizontal  line  of  another  image 
Ag.  Then  we  select  lines  from  the  successive  frames  at  the  same  spatial 
location  and  store  them  one  below  another  at  vertical  sampling  intervals 

of  the  images.  The  resulting  line  sampled  image  Aq  over  the  time  period 

y 

of  interest  is  a  temporal  cross-section  in  the  direction  6 ,  The  inter¬ 
frame  motion  could  then  be  visually  analyzed  by  viewing  several  temporal 
cross-sections  as  follows. 

If  a  pixel  is  undergoing  linear  translation  along  the  direction  0 

its  path  will  appear  as  a  curve  on  the  plane  of  A^.  The  slope  of  the 

y 

curve  (with  respect  to  the  vertical  axis)  gives  the  velocity  of  the  pixel. 
The  pixel  intensities  on  this  curve  will  be  constant.  In  the  context  of 
wave  propagation,  fluid  flow  (or  more  generally  for  systems  described  by 
hyperbolic  partial  differential  equations)  etc.,  these  curves  are  called 
the  characteristics .  If  a  region  is  undergoing  linear  translation  per¬ 
pendicular  to  6,  the  image  of  that  region  will  appear  on  A  with  a  scaling 

u 

factor  along  the  vertical  axis  of  A^  (the  scaling  factor  depends  upon  the 
velocity  of  the  region).  If  there  is  a  camera  zoom,  we  will  see  lines 
converging  or  diverging  along  the  zoom  axis.  If  an  object  in  the  region 
is  rotating,  we  will  see  sinusoidal  traces.  Such  temporal  cross-sections 
have  been  used  earlier  for  tomography  x-ray  images  in  which  the  object  is 
rotated  at  uniform  speed  (by  making  an  equivalence  between  time  and  angle) 
and  they  are  called  slnugrams  [63]. 

Figure  2-1  shows  temporal  cross-sections  for  the  Head  and  Shoulders 
and  the  Chemical  Plant  images  along  some  spatial  directions.  From  images 


Temporal  Cross-Sections  of  the 
Head  &  Shoulders  Images  Along 
(1)  Row  #183 

(ii)  Row  #185 

(iii)  Column  #127 

(iv)  Column  #129 

(v)  Main  Diagonal 
(b) 


Temporal  Cross-Sections  of  the 
Chemical  Plant  Images  Along 
(1)  Row  #170 

(ii)  Row  #172 

(iii)  Column  #127 

(iv)  Column  #129 

(v)  Main  Diagonal 
(d) 


Temporal  Cross-Sections  of  16  Frames  of  Head  and  Shoulders  and  Chemical 
Plant  Images.  White  lines  on  images  in  (a)  and  (c)  show  the  spatial  loca 
tlons  of  the  cross-sections. 


Figure  2-1 
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(a)  and  (b)  we  see  down  and  up  and  right  to  left  motion  of  the  face  in 
the  16  frames  of  the  Head  and  Shoulders.  From  Images  (c)  and  (d)  we  see 
that  for  the  Chemical  Plant  images  the  motion  is  almost  purely  vertically 
downward.  First  slow,  then  rapid,  and  just  before  the  end  it  is  almost 
stationary.  We  also  notice  that  the  slope  of  the  motion  trajectories  in 
d(lli)  Increases  as  we  move  from  left  to  right,  which  means  that  the 
bottom  of  the  images  is  moving  faster  than  the  top.  Since  the  ground 
objects  are  stationary,  it  is  only  possible  if  the  camera  is  having  motion 
other  than  linear  translation  parallel  to  the  ground. 

2.2  Interframe  Motion  Trajectory  Estimation  and  Modeling: 

Let  us  assume  that  each  point  of  a  continuous  image  is  undergoing 
motion  and  appears  at  some  location  in  an  image  at  other  time  instant. 

Let  u(x,y,t),  a  zero  mean  random  variable,  denote  the  intensity  of  the 
(x,y)  coordinate  of  an  image  at  time  t.  Let  each  image  be  a  sample  of  a 
2-D  homogeneous  stationary  random  process  whose  covariance  is  given  by 

E[u(x',y’,t)u(x+x',y+y',t)]  =  a^R(|xj , |y| )  ,  (2-1) 

where  E[.]  denotes  the  expectation,  ["I  denotes  the  absolute  value,  and 
2 

0  is  the  variance  of  u(x,y,t). 

Let  (x+dx,y+dy,t+dt)  be  the  new  location  of  the  point  (x,y,t). 

Then  the  trajectory  of  motion  is  given  by 

u(x,y,t)  *  u(x+dx,y+dy ,t+dt)  =  Constant.  (2-2) 

Let  the  observed  value  of  u(x,y,t)  be  given  by 


v(x,y,t)  =  u(x,y,t)  +ri(x,y,t) 


(2-3) 


.27 


when  ri  is  the  observation  noise  which  is  assumed  to  be  white  and  indepen- 

2 

dent  of  u.  Let  n  have  zero  mean  and  variance  a  . 

n 

Now  let  us  assume  that  the  motion  trajectory  is  estimated  piecewise, 
i.e.,  at  discrete  time  Instants.  Let  dx  and  dy  be  the  estimates 
of  dx  and  dy,  respectively,  and 

dx  =  dx  -  dx  ,  dy  =  dy  -  dy 

be  the  motion  estimation  error.  Figure  2-2  shows  the  concept  of  trajec¬ 
tory  approximation  without  and  with  motion  estimation  for  the  component 
of  the  motion  along  x-axis. 

The  motion  compensated  Interframe  estimate  is  given  by 

v'^(x+dx,y+dy,t+dt)  =  v(x,y,t)  ,  (2-4) 


where  superscript  c  denotes  motion  compensation.  The  temporal  correla¬ 
tion  after  motion  compensation  is  given  by 


E[v'^(x+dx,y+dy,t+dt)  •v(x+dx,y+dy,t+dt)  ] 

E[v^(x+dx,y+dy,t+dt) ] 

E[v(x,y,t) 'v(x+dx,y+dy,t+dt) ] 

E[{u(x+dx,y+dy,t+dt)  +  ri(x+dx,y+dy,t+dt)}^] 

1  _  _  _  _ 

=  — 2 — 9  •E[{u(x,y, t) +Ti(x,y ,t)  }  •  {u(x+dx,y+dy ,t+dt)  +  n(x+dx,y+dy ,t+dt)}  ] 

a  -Hr 

n 

1 

=  — 2  •E[u(x+dx,y+dy,t+dt) •u(x+dx,y+dy,t+dt) ] 

n 

=  -2 — j  'Era^-Rddx-dxl  ,  |dy-dyl)] 

a  +a 

n 


i  f 


!  i, 

I 
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or 


a  E[R(|  dxl ,1 dy| ) 


or 


dt 


'dt 


2  2 
a  +a 

n 


E[R(|dxj , jdy[) ] 


l+a^/a^ 


(2-5) 


For  small  values  of  dx  and  dy  most  image  covariance  functions  could  be 
assumed  to  be  approximately  linear  functions  of | dx|  and  |dy]  and  the 
above  could  be  approximated  by 


c 

Pdt  ' 


R(El|dxl],  E[ldyl]) 


1+a  /o 

n 


(2-6) 


Thus, from  the  distribution  of  dx  and  dy  one  can  obtain  the  temporal  cor¬ 
relation,  which  may  be  used  for  interframe  data  compression. 

We  now  define  another  quantity,  which  we  call  motion  compensated 
interframe  variance  (MCIFV)  as 


=  Et{v(x+dx,y+dy,t+dt)  -  '^(x+dx,y+dy,t+dt)}  ] 
dt 

=  E[fv(3cfdx,y+dy,t+dt)}^  +  E[{v‘^(xfdx,y+dy,t+dt)}^J 


-  2E[v(x+dx,y+dy,t+dt) -v  (x+dx,y+dy ,t+dt) ] 


c;,  -  2<cr^wf)(l-p'j) 


’dt 


or 


-  1  -  . 


dt 


’dt' 


(2-7) 

(2-8) 


In  the  absence  of  motion  compensation  (2-5)  and  (2-7)  become 


E[R(|dx|,|dy|)] 

1  +  aW 

h 


(2-9) 
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Cat  =  2(a2-Kj2)(l-p^^)  .  (2-10) 

In  the  above  discussion,  the  observation  noise  was  Included  to  show 
how  It  effects  temporal  correlation.  In  our  coding  experiments  and 
future  analysis,  we  assume  that  no  observation  noise  Is  present. 

2.3  Motion  Measurement  Techniques 

In  this  section  we  describe  methods  of  measuring  Interframe  mo¬ 
tion  for  digitized  Images  with  particular  emphasis  on  data  compression. 

First,  we  approximate  the  Interframe  motion  by  piecewise  linear 
translation  of  one  or  more  areas  of  a  frame  relative  to  a  reference 
frame.  The  segmentation  of  an  Image  Into  areas  each  of  which  Is  under¬ 
going  approximately  the  same  linear  translation  and  the  measurement  of 
the  magnitude  and  the  direction  of  the  linear  translation  of  each  area, 

Is  a  difficult  task.  Cafforlo  and  Rocca  [11]  describe  a  method  for  seg¬ 
mentation  and  measurement  of  the  linear  displacement  of  a  single  moving 
object  In  a  stationary  background.  Then  extension  of  the  method  to  more 
than  one  moving  object  has  also  been  shown.  Thus,  method  becomes  Increas¬ 
ingly  complex  as  the  number  of  moving  areas  Increases  and  the  size  of 
the  Image  grow  larger.  There  Is  another  difficulty  with  such  a  method  of 
segmentation  If  the  Information  of  segmentation  and  linear  translation  Is 
to  be  coded.  Coding  of  segmentation  with  arbitrary  boundaries  would  require 
a  complex  scheme  and,  moreover,  the  length  of  the  code  will  be  large. 

A  simpler  method  Is  to  segment  an  Image  Into  fixed  size  smaller 
rectangular  areas  and  to  assume  that  each  of  these  areas  Is  undergoing 
Independent  linear  translation.  If  these  areas  are  small  enough,  rotation, 
zooming,  etc.  of  larger  objects  can  be  closely  approximated  by  piecewise 


/*• 
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linear  translation  of  these  smaller  areas.  Also,  it  avoids  the  problem 
of  coding  the  segmentation  information.  Only  the  displacement  vector  of 
each  of  the  areas  need  to  be  transmitted.  Another  simplification  is  to 
restrict  the  motion  measurement  to  an  integer  number  of  pixels.  This 
would  give  an  accuracy  up  to  .5  pixels  in  the  moving  areas.  Since 
in  practice  the  motion  is  not  an  ideal  linear  translation,  an  effort  to 
estimate  the  displacement  vector  up  to  a  fraction  of  a  pixel  will  not 
result  in  significant  improvement  in  prediction. 

A  method  which  has  been  used  for  the  measurement  of  linear  shift 
between  two  given  images,  particularly  for  aerial  guidance,  is  area 
correlation  [59,85].  This  consists  of  calculating  the  area  correlation 
function  of  the  two  images.  The  location  of  the  peak  of  the  correlation 
function  gives  the  displacement  vector.  The  area  correlation  function  is 
usually  calculated  via  the  fast  Fourier  transform  (FFT) .  To  improve  the 
accuracy  of  this  method  some  filtering  or  preprocessing  of  the  images  is 
required,  which  could  be  done  in  the  spatial  domain  [59]  or  the  Fourier 
domain  [ 85 ] . 

For  the  purpose  of  piecewise  linear  translation  measurement  we 
divide  an  image  into  smaller  rectangular  areas,  which  we  call  sub-blocks, 
and  correlate  them  with  the  appropriate  areas  of  the  reference  image. 

Let  U  be  an  M  X  N  size  sub-block  of  an  image  and  Uj^  be  an  (lfl-2p)  x  (N+2p) 
sub-block  of  the  reference  image,  centered  at  the  same  spatial  location 
as  U,  where  p  is  the  maximum  displacement  allowed  in  Integer  number  of 
pixels  in  either  direction.  Then  the  area  correlation  function  is  given  by 

f  M  N 

C  (l.j)  ■  I  I  u  (m,n)uu(in+l,n+j) ,  -p  £  l,j  £  p  (2-11) 

m-l  n-1 


where  superscript  f  denotes  filtering  or  preprocessing  in  the  spatial 
or  Fourier  domain.  A  simple  spatial  operator  which  has  been  found  to  be 
useful  in  area  correlation  is  a  four  point  Laplacian  operator  given  by 

u^(m,n)  =  u(m,n)  -  {u(m-l,n)  + u(m+l,n)  + u(m,n-l)  + u(m,n+l) } /4^  V  m,n  . 

Let  V  and  be  the  discrete  Fourier  transforms  of  U  and  U^,  respectively, 
then  a  Fourier  domain  filter  given  by 

|v^(m,n)l  =  |v(m,n)I^,  0£Y£1  (2-12) 

where  I*]  represents  the  magnitude,  has  been  found  to  be  useful  [85]. 

We  have  found  that  the  performance  of  the  area  correlation  method 
is  poor  for  smaller  sub-block  sizes,  areas  of  low  spatial  activity,  and 
for  sub-blocks  not  undergoing  pure  linear  translation.  We  have  found 
another  method  which  does  significantly  better  under  most  circumstances 
for  Interframe  image  motion  estimation.  This  method  requires  a  search  for 
the  direction  of  minimum  distortion  (or  DMD)  and  is  described  below. 

Let  us  define  a  mean  distortion  function  between  U  and  Uj^  as 

M  N 

D(l,j)  =  ^  I  I  g{u(m,n)  -  Uj^(Tiri-i,n+j)},  -p  <  i,j  1  p  (2-13) 
m=l  n=l 

where  g{x}  is  a  given  positive  and  increasing  distortion  function  of  x,  e.g. 
2 

g{x}  =  X  would  correspond  to  D(i,j)  as  mean  square  error  function.  The 
direction  of  minimum  distortion  is  given  by  (l,j),such  that  D(i,j)  is 
minimum. 

One  difficulty  with  finding  DMD  as  stated  above  is  that  it  requires 
evaluation  of  D(i,j)  for  (2p+l)  x  (2p+l)  directions,  and  even  for  motions 
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up  to  5  pixels  along  either  side  of  the  axes,  one  has  to  search  121 
directions.  We  have  found  a  solution  to  overcome  the  above  difficulty 
by  making  an  assumption  that  if 

D  (q,?.)  =  min  {D(i,j)} 

°  i.j 


then  for  m  =  i-q,  n=j-J!,,  the  functions 

Dj^(im|,|n()  =  D(i,j)  -  D^(q,2,), 
D^dml  ,|n|)  =  D(i,j)  -  D^(q,Ji), 
D3([m|  ,!n[)  =  D(i,J)  -  D^(q,Jl), 
D^(|ml,inl)  =  D(i,j)  -  D^(q,J)), 


m  ^  0,  n  ^  0 
m  0,  n  £  0 
m  £  0,  n  £  0 
m  £  0,  n  £  0 


are  nondecreasing  function  of  [m|  and  [nj,  i.e.,  for  m,n,m',n'  £  0, 
1  <  k  £  4, 


Dj^(m,n)  £  Dj^(m',n'),  if  m  £  m'  and  n  £  n'. 


For  g{x}  =  X  the  above  is  equivalent  to  the  assumption  that  the  covariance 
function  of  images  is  a  nonincreasing  function.  Most  image  covariance 
functions  satisfy  this  condition,  at  least  in  a  small  neighborhood  of  the 
origin. 

Having  made  the  above  assumption,  we  use  a  2-D  directed  search 
method,  which  is  similar  to  the  binary  or  logarithm  search  [90]  in  one 
dimension.  The  search  is  accomplished  by  successively  reducing  the  area 
of  search  to  half  or  less.  Each  step  consists  of  searching  five  directions, 
which  contain  the  center  of  the  area,  and  the  midpoints  between  the  center 


and  the  four  boundaries  of  the  area  along  the  x,y  axes  passing  through 
the  center.  This  procedure  continues  until  the  plane  of  search  reduces 
to  a  3  3  size.  In  the  final  step  all  the  9  directions  are  searched  and 

the  location  corresponding  to  the  minima  is  the  DMD.  The  algorithm  is 
given  below. 

For  any  integer  m  >  0,  we  define 

=  {(i,,i)  ;  -m  <  i,j  <  m) 

v/fCm)  =  {(0,0),(m,0),(0.m),(-m,0),(0,-m)}.  ^ 

^  Logarithmic  Search  Procedure  for  DMD: 

Step  1:  (initialization) 

(i,j)  «!  j(/(p) 
n’  =  Qog2pJ 
n  =  max* {2,2" 

q  =  ?.  =  0  (or  an  initial  guess  for  DMD) 

where  L  J  is  a  lower  integer  truncation  function.  I 

Step  2;  /f’(n)  ^^(n). 

Ste£jl:  Find  (i,j)€yf'(n)  such  that  D(i+q,j+£)  is  minimum.  If  i=0  and 
j=0,  go  to  Step  5;  otherwise  go  to  Step  4. 

St^:  q  -  q+i,  ^  ^  2.+j;>f’(n)  (n)  0  (-i,-j);  go  to  Step  3. 

Ste£_5:  n  n/2.  If  n=l,  go  to  Step  6;  otherwise,  go  to  Step  2. 

Ste£_6:  Find  (i,j)  €>tl)  such  that  D(i+q,j+£)  is  minimum,  q  -f-  i+q, 

^  *'  ^+j*  (q.X-)  then  gives  the  DMD. 

Figure  2-3  Illustrates  the  search  procedure  for  p  »  5. 


The  figure  above  shows  the  concept  of  the  2-D  logarithmic  search  to  find 
a  pixel  in  another  frame  which  is  registered  with  respect  to  the  pixel 
(l,j)  of  a  given  frame,  such  that  the  mean-square  error  over  a  block  de¬ 
fined  around  (i,j)  is  minimized.  The  search  is  done  step  by  step  with  ^ 
indicating  the  directions  searched  at  a  step  number  marked.  The  numbers 
circled  show  the  optimum  directions  for  that  search  step  and  the  *  shows 
the  final  optimum  direction, (i-3,  j+1)  in  the  above  example.  This  pro¬ 
cedure  requires  only  searching  13-21  directions  for  the  above  grid  as 
opposed  to  121  total  possibilities. 

Figure  2-3:  A  2-D  Logarithmic  Search  Procedure  for  the  Direction  of 
Minimum  Distortion. 
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2.4  Motion  Measurement  Results 

The  method  of  DMD  motion  measurement,  discussed  in  the  previous 

section,  was  applied  to  the  Head  and  Shoulders  and  the  Chemical  Plant 

2 

data.  The  distortion  function  g(x)  =  x  was  used  so  that  DMD  would  cor¬ 
respond  to  minimum  mean  square  error  in  registration  of  the  sub-blocks. 

A  sub-block  size  of  16  x  vas  chosen.  The  sizes  16  x  16  and  32  x  32 
were  found  to  be  good  compromises  between  accuracy  of  piecewise  linear 
translational  approximation  of  the  motion,  the  cost  of  transmitting  dis¬ 
placement  vectors,  and  the  complexity  of  data  compression  schemes  using 
motion  measurement. 

For  the  multiframe  data,  when  the  reference  image  is  a  neighboring 
frame  of  the  image  relative  to  which  motion  measurement  is  done,  the 
quantity  D(0,0)  will  be  called  interframe  variance.  Once  the  DMD  for  a 
sub-block  has  been  found,  the  area  of  the  reference  image  in  the  direction 
of  DMD  is  taken  as  the  motion  compensated  estimate  of  the  sub-block.  By 
collecting  all  the  motion  compensated  estimates  from  a  reference  frame, 
one  obtains  the  motion  compensated  reference  frame.  If  (q,)l)  is  the  dis¬ 
placement  vector  of  the  DMD,  then  D(q,Jl)  is  the  interframe  variance  with 
motion  compensation  and  D(0,0)  is  the  interframe  variance  without  motion 
compensation  for  that  sub-block.  These  quantities  for  a  frame  are  ob¬ 
tained  by  averaging  them  over  all  the  sub-blocks. 

When  the  reference  frame  is  the  same  as  the  current  image  itself, 
D(i,j)  computed  over  complete  frame  gives  the  average  interframe  variance 
as  a  function  of  uniform  linear  translation  vector  (i,j).  The  above 
quantity  gives  a  rough  estimate  of  the  average  motion  between  two  frames. 
Figure  2-4  gives  the  interframe  variance  as  a  function  of  linear  trans¬ 
lation  for  the  Head  and  Shoulders  and  the  Chemical  Plant  data. 
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Figures  2-5  and  2-6  show  the  results  of  motion  measurement  for  the 
central  256  x  128  portion  of  a  frame  for  the  Head  and  Shoulders  and  the 
Chemical  Plant  data,  respectively,  relative  to  the  preceding  frame.  Part 
(a)  shows  the  displacement  vectors  for  each  of  the  16  x  16  sub-blocks. 

The  success  of  the  DMD  motion  location  method  seems  evident  from  Figure 
2-6(a).  The  image  is  known  to  have  a  vertically  downward  motion  relative 
to  the  previous  frame  as  well  as  a  geometric  distortion  such  that  the  top 
of  the  image  undergoes  a  smaller  displacement  and  the  bottom  a  larger 
displacement  [67,  p.  102].  We  can  see  that  mostly  our  scheme  predicts 
a  successively  larger  motion  as  one  moves  from  the  top  to  the  bottom, 
as  expected. 

Parts  (b)  and  (c)  of  Figure  2-5  and  2-6  show  the  interframe  var¬ 
iance  (IFV)  without  and  with  motion  compensation  (MC)  for  various  sub¬ 
blocks.  For  the  Head  and  Shoulders  data  (Figure  2-5)  there  are  very  wide 
variations  among  sub-blocks  in  IFV  without  motion  compensation.  This  is 
both  due  to  a  variation  in  motion  (from  nearly  stationary  to  more  than  3-4 
pixels/frame)  and  the  spatial  activity  (from  a  pixel  to  pixel  correlation 
of  .99  to  less  than  .8)  as  a  result  of  non-stationarity .  After  motion 
compensation  the  variation  narrows  down  significantly  and  is  mostly  due 
to  the  spatial  activity.  For  Chemical  Plant  (Figure  2-6)  the  variation 
without  motion  compensation  is  not  as  wide  because  there  are  no  stationary 
areas . 

Comparing  the  average  values  of  the  IFV  with  motion  comepnsatlon 
and  the  entries  of  Figure  2-4,  we  can  compute  an  estimate  of  the  average 
motion  uncertainty.  We  assume  that  the  motion  uncertainty  is  identically 
distributed  along  both  the  image  dimensions  and  that,  for  small  motion, 
the  IFV  is  a  linear  function  of  motion  in  pixels.  With  this  assumption 
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Figure  2-6:  Results  of  Motion  Measurement  Relative  to  the  Previous  Frame 

on  a  Portion  of  Chemical  Plant  Frame  No.  12  for  Sub-block  Size 
of  16  16, 
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we  compare  the  IFV  with  the  diagonal  entries  of  Figure  2-4  and  find  the 
average  motion  by  interpolation.  For  example,  the  average  IFV  with  motion 
compensation  for  the  Chemical  Plant  frame  No.  12  is  139.34  (Figure  2-6 (c)). 
Comparing  this  value  with  the  diagonal  entries  of  Figure  2-4 (a)  we  find 
that  it  lies  between  0  and  572.96.  Thus,  by  Interpolation  we  obtain  an 
average  value  of  motion  uncertainty  after  motion  compensation  as  (.25,. 25). 
Similarly,  for  the  Head  and  Shoulders  data  we  find  it  to  be  approximately 
(.l,.l).  Observing  that  for  the  Head  and  Shoulders  data  more  than  half 
the  image  area  is  nearly  stationary,  the  average  motion  uncertainty  in  the 
moving  areas  can  be  approximated  as  (.25,. 25)  pixel.  This  means  that  the 
DMD  method  indeed  measures  the  motion  with  an  accuracy  up  to  the  nearest 
integer  pixel  most  of  the  time.  Thus,  the  absolute  value  of  the  motion 
uncertainty  in  the  moving  areas  can  be  modeled  as  uniformly  distributed 
between  0  and  .5  pixel  along  each  of  the  dimensions,  giving  an  average 
value  of  .25. 

Table  2-1  shows  the  improvement  in  the  interframe  prediction  due 
to  motion  compensation.  With  no  motion  compensation^ the  prediction  of  a 
pixel  is  the  value  of  a  pixel  in  the  previous  frame,  having  the  same 
spatial  location  whereas  with  motion  compensation,  the  predicted  value 
comes  from  the  previous  frame  pixel  in  the  direction  of  minimum  distortion. 
Note  that  the  variance  of  this  error  is  nothing  but  IFV.  An  interesting 
and  important  observation  is  that  there  is  a  wide  frame  to  frame  variation 
in  IFV  without  motion  compensation  due  to  variation  in  motion  activity  as 
a  function  of  time  (4.64  dB  between  Head  and  Shoulders  frames  6  and  7), 
After  motion  compensation  this  variation  becomes  negligible.  Showing 
that  the  uncertainty  in  motion  prediction  is  identically  distributed  over 
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table  2-1 

IMPROVEMENT  IN  SNR  OF  INTERFRAME  PREDICTION  ERROR  (IFPE)  DUE  TO  MOTION 
COMPENSATION.  MOTION  MEASUREMENT  WAS  DONE  ON  16  x  16  SIZE  SUB-BLOCKS 
USING  DIRECTION  OF  MINIMUM  DISTORTION  SEARCH  WITH  MEAN  SQUARE  CRITERION. 


DATA 

FRAME 

SNR  OF  IFPE  IN  DECIBELS 

SET 

No. 

Without  Motion 

With  Motion 

Improvement 

Compensation 

Compensation 

HEAD 

6 

29.90 

35.88 

5.98 

7 

25.26 

35.68 

10.42 

& 

8 

26.18 

36.30 

10.12 

SHOULDERS 

9 

26.03 

36.26 

10.23 

CHEMICAL 

11 

16.66 

26.77 

10.11 

PLANT 

12 

16.90 

26.69 

9.79 

13 

17.53 

26.56 

9.03 

TABLE  2-2 

IMPROVEMENT  IN  SNR  OF  INTERPOLATED  FRAME  (FROM  THE  PRECEDING  AND  THE 
FOLLOWING  FRAMES  OF  THE  ORIGINAL  DATA)  DUE  TO  MOTION  COMPENSATION. 


SNR  OF  INTERPOLATED  FRAME  IN  DECIBELS 


iiNi 

FRAME 

Without  Motion 
Compensation 

With  Motion 
Compensation 

Improvement 

HEAD  &  SHOULDERS 

FRAME  #8 

30.48 

38.61 

8.15 

CHEMICAL  PLANT 

FRAME  If  12 

19.34 

29 . 56 

10.22 
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different  frames  even  though  the  motion  itself  is  not.  Figures  2-7(a) 
and  2-8 (a)  show  interframe  prediction  and  error  Images. 

The  error  Images  shown  in  Figures  2-7  and  2-8  and  elsewhere  show 
the  absolute  value  of  the  errors  amplified  and  truncated  to  the  largest 
value  of  255.  The  darker  points  show  larger  errors.  The  amplification 
for  the  Head  and  Shoulders  data  is  ten  times  and  for  the  Chemical  Plant 
data  it  is  five  times.  Only  about  three- fourths  portion  of  the  error 
images  have  been  shown  for  these  data  sets. 

2.5  Frame  Repetition  and  Interpolation  Along  Motion  Trajectory 

Frame  skipping  is  one  of  the  simplest  methods  of  data  compression 
for  Interframe  motion  Images.  For  simplicity  of  discussion,  we  assume 
skipping  of  every  alternate  frame.  However,  the  discussion  could  be 
easily  extended  to  other  cases.  A  skipped  frame  is  generally  reproduced 
by  either  repeating  the  preceding  frame  or  by  interpolation  from  the  pre¬ 
ceding  and  the  following  frames.  Both  these  methods  have  serious  effects 
on  the  quality  of  motion  reproduction.  The  former  results  in  jerkiness 
in  the  reproduction  of  the  motion  and  the  latter  in  blurring  of  the  moving 
areas.  This  is  evident  by  looking  at  part  (i)  of  Figures  2-7(b)  and  2-8(b). 

Let  U2j^  be  a  sub-block  of  the  (2k)th  frame  where  frames  2,4,...  , 
2k,...  have  been  skipped.  Then  reproduced  value  of  U2j^,  is  ob¬ 

tained  (without  motion  compensation)  as  follows. 

Frame  Repetition: 

u*j^(m,n)  =  U2^_^(m,n) 


(2-14) 
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Frame  Interpolation: 


u*j^(m,n)  =  52{u2j^_j^(m,n)  +  U2j^^^(m,n)}  .  (2-15) 

The  disadvantages  of  simple  frame  repetition  or  interpolation  can 
be  overcome  by  using  motion  compensation,  l.e.,  making  the  prediction  or 
interpolation  along  the  motion  trajectory.  Using  motion  compensation 
(2-14)  and  (2-15)  are  replaced  by 

U2j^(m,n)  =  (2-16) 

and 

U2j^(m,n)  =  *5{u2j^_j^(m+q ,n+J.)  +  U2j^^.^(nH-q' ,n+£’)}  (2-17) 

respectively,  where  (q,il)  and  (q '»)!'')  are  the  coordinates  of  the 
displacement  vectors  of  U2j^  relative  to  the  preceding  and  the  following 
frames,  respectively. 

The  Improvement  in  SNR  of  the  interframe  prediction  error  shown 
in  Table  2-1  and  Figures  2- 7 (a)  and  2-8 (a)  is  nothing  but  the  Improvement 
due  to  frame  repetition  along  motion  trajectory  compared  with  a  simple 
frame  repetition.  Table  2-2  and  Figures  2-7(b)  and  2-8(c)  show  the 
improvement  due  to  the  frame  Interpolation  along  the  motion  trajectory 
compared  with  a  simple  interpolation  along  the  temporal  axis. 

Thus,  we  see  that  the  approximation  of  the  motion  by  linear  trans¬ 
lation,  on  a  sub-block  by  sub-block  basis,  gives  excellent  results  for 
the  video  motion  Images  considered.  These  results  could  be  used  w:Lth 
interframe  predictive  coding  schemes,  such  as  DPCM,  and  still  better 
with  hybrid  coding  schemes  (as  discussed  in  chapter  V)  with  a  great  im¬ 
provement  in  coder  performance.  The  results  of  chapter  V  show  an 
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improvement  in  compression  gain  by  a  factor  of  two  when  motion  compen¬ 
sation  and  frame  interpolation  along  motion  trajectory  are  used.  Even 
higher  gains  are  expected  with  further  increase  in  the  sampling  interval 
along  the  temporal  axis  (i.e.,  skipping  more  frames)  and  interpolation  of 
missing  frames  along  motion  trajectory. 
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CHAPTER  III 

INTERFRAME  PREDICTIVE  CODING 

In  section  1.3.1  we  had  briefly  discussed  several  predictive 
techniques.  One  of  these  techniques,  called  frame  replenishment  with 
cluster  coding,  is  described  in  the  following  section.  In  section  3.2 
we  report  a  simple  predictive  scheme  which  we  call  adaptive  classifica¬ 
tion  prediction.  This  has  been  developed  primarily  for  comparison  with 
other  schemes.  These  schemes  have  been  simulated  and  applied  on  the 
Head  and  Shoulders  Images. 

3.1  Frame  Replenishment  with  Cluster  Coding 

This  technique  was  developed  at  the  Bell  Telephone  Laboratories 
and  is  described  by  Candy,  et  al.  [13].  We  have  implemented  it  for  com¬ 
parison  purposes.  This  technique  mainly  consists  of  transmitting  the 
addresses  and  quantized  amplitudes  of  the  "significant"  interframe  dif¬ 
ferences  of  the  consecutive  frames.  The  interframe  difference  at  any  pixel 
location  is  classified  as  significant  when  its  absolute  value  exceeds  a 
fixed  threshold.  The  experimental  observation  that  most  of  the  signifi¬ 
cant  Interframe  differences  occur  in  clusters  along  any  frame  line,  moti¬ 
vates  the  fact  that  the  addresses  of  the  significant  interframe  differences 
could  be  efficiently  coded  by  transmitting  the  beginning  address  of  a 
cluster  and  a  cluster  terminator  code.  It  is  obvious  that  most  of  the 
clusters  will  appear  in  the  areas  of  an  image  consisting  of  moving  edges 
or  objects.  This  Implies  that  the  technique  would  generate  a  variable 
bit-rate  for  each  frame,  depending  upon  the  activity  and  motion  in  the 
frame.  Thus,  transmission  of  the  data  on  a  channel  with  a  fixed  bit- 
rate  would  necessitate  a  buffer. 


To  avoid  an  arbitrarily  large  buffer  requirement,  some  controls 
(not  without  the  penality  of  higher  distortion) ,  which  limit  the  buffer 
requirement  to  a  given  buffer  length,  are  applied.  In  our  simulations  we 
keep  the  buffer  capacity  equal  to  the  average  number  of  coded  bits  (de¬ 
sired  bit  rate)  per  frame.  All  controls  are  determined  by  the  number  of 
bits  residing  in  the  buffer.  Figure  3-1  describes  the  buffer  control 
levels  for  this  scheme. 

If  the  contents  of  the  buffer  fall  below  point  A,  the  next  line  is 
transmitted  as  it  is,  using  8  bits/pixel,  to  prevent  buffer  under-flow. 

When  the  buffer  contents  exceed  point  C,  the  frame  differences  in  a  cluster 
are  subsampled,  i.e.,  every  other  frame  difference  is  transmitted  and  at 
the  receiver  the  missing  value  is  interpolated.  The  sub-sampling  continues 
until  the  buffer  contents  fall  below  point  B.  When  the  buffer  contents 
exceed  points  C,  D  and  E,  the  threshold  for  classification  of  signifi¬ 
cant  changes  is  Increased  successively  to  lower  the  number  of  significant 
changes.  When  the  buffer  is  filled  beyond  point  F,  coding  is  stopped  for 
one  frame  period  and  sub-sampling  is  continued  for  the  next  frame  period. 
This  is  done  to  avoid  the  buffer  overflow. 

In  the  beginning,  the  first  three  lines  of  the  first  frame  are 
force  updated,  i.e.,  they  are  transmitted  as  8-bits/pixel.  In  the  next 
frame, the  next  three  lines  are  force  updated  and  so  on,  except  when  the 
coder  is  in  buffer  overflow  condition.  At  this  rate  a  complete  frame  is 
refreshed  or  updated  approximately  every  3.5  seconds  for  24  frames/sec. 
transmission.  Nine-bit  codes  are  used  to  designate  starting  of  a  new 
frame,  starting  of  a  new  line,  and  the  starting  addresses  of  the  clusters 
in  a  given  line.  Removal  of  isolated  points  of  significant  changes. 
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and  bridging  of  the  clusters  which  are  very  close  are  done  to  reduce 
t  :e  bit-rate.  For  further  details,  see  [13]. 

3.2  Adaptive  Classification  Prediction  Coding 

As  pointed  out  in  the  last  chapter,  if  a  point  moves  more  than  one 
and  if  its  direction  of  motion  is  unknown,  then  (for  highly  correlated 
images)  spatial  prediction  performs  better  than  pure  temporal  prediction. 
On  the  other  hand,  for  stationary  pixels  a  temporal  prediction  is 
preferable. 

We  have  developed  a  very  simple  criterion  to  classify  a  pixel  as 
stationary  or  slowly  moving  (about  1  pixel/frame  in  any  direction)  or 
rapidly  moving  relative  to  the  previous  frame.  This  could  be  easily 
implemented  online  and  is  based  on  the  Interframe  differences  of  its 
nearest  neighbors  in  the  present  frame  as  shown  in  Figure  3-2.  Only  in 
the  case  of  slow  motion  do  we  approximate  any  kind  of  motion  or  change  by 
a  translatory  motion,  and  we  search  its  direction  assuming  that  it  came 
from  one  of  the  nearest  neighbors  in  the  previous  frame  as  shown  in 
Figure  3-3.  The  direction  of  the  rapid  motion  is  of  no  consequence  to 
us  as  we  rely  on  spatial  prediction  in  this  case.  A  block  diagram  of 
the  scheme  is  shown  in  Figure  3-4. 

Let  ^  j  denote  the  intensity  of  the  pixel  on  the  i^L  scan 
line  of  the  k*^h  frame  and  u,  be  its  reconstructed  value  at  the 

^  i  >  J 

receiver  with  no  channel  errors. 

3.2.1  Motion  Predictor  -  As  established  in  the  last  chapter,  the 
motion  of  a  pixel  is  quite  close  to  that  of  its  nearest  neighbors  and 
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Figure  3-4:  An  Adaptive  Classification  Prediction  Scheme. 
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Figure  3-5:  Buffer  Control  Levels  for  the  Adaptive  Classification 
Prediction  S cherne . 
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interframe  difference  is  a  good  measure  of  the  extent  of  the  motion. 

In  order  to  classify  motion  we  measure,  over  a  neighborhood  of  the  pixel, 
a  weighted  sum  of  the  absolute  interframe  difference.  A  small  neighbor¬ 
hood  would  be  sensitive  to  the  quantization  error,  noise,  etc.,  and  a 
large  neighborhood  requires  more  computations  and  would  not  respond 
quickly  to  the  changes  in  motion.  Keeping  that  in  mind,  the  neighborhood 
of  Figure  3-2  was  chosen.  Also  we  have  chosen  the  absolute  value  of  the 
interframe  difference  signal  as  opposed  to  its  square  to  reduce  the  sensi¬ 
tivity  to  large  quantization  errors.  Based  on  these  criteria,  we  define 
a  neighborhood  activity  index  A^  .  as 


j 


I 

(x,y)G//'^ 


,y'"k  ,i+x,j+y  “  k-l,i+x,j+y' 


where  w  are  the  weights  and  jl/’ the  pixel  neighborhood,  and 


(3-1) 


w  >0 

x.y  - 


JV=  {(0,-p);(-l,-l);(-l,0);(-l,l)} 


f  2,  i 


f  coder  is  in  2:1  sub-sampling  mode 


1,  otherwise. 


(3-2) 


(3-3) 


(3-4) 


Then  u,  ,  .  is  assigned  to  one  of  the  three  classes — c^  (stationary) , 
Cj^  (slow  motion)  or  Cj^  (rapid  motion)  as  follows. 
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f  'S’  “  \,1.J  i  h  "  \,1,J  i  S  ' 's 

S’  “  'Sc.l.j  i  '•4  »'  \,l.j  i  H  “■*  "ic.l.j-l  '  'R 


c,,,  otherwise 
M 


where  £  L2  £  are  predetermined  thresholds.  Note  that  we  have 

chosen  elastic  thresholds  between  the  classes.  In  order  to  jump  from  Cg 
to  c  or  c_  we  have  a  higher  threshold  (L,)  than  to  jump  from  c  or  c 
to  Cg,  L^.  The  converse  is  true  for  transition  from  and  to  class  c^^. 
This  reduces  the  sensitivity  of  classification  to  quantization  and 
other  noise. 


3.2.2  Intensity  Prediction  -  Based  on  the  classification  of 

motion,  the  prediction  of  tL  .  denoted  by  tL  .  is  given  by 

K,i,j 


Vi,i,j 


if 


G  C- 


\,l,l-q,j-r’  “k,l,j  ^  'll 


(P 


(3-5) 


G  C., 


where  and  are  one  step  correlation  coefficients  along  i  and  j  res¬ 
pectively,  and  the  pair  (q,r)  e  is  chosen  so  as  to  minimize 

b 


B, 


,lj(q,r)  -  i5,^i+x,j+y  "  \-l,i+x+q,j+y+rl  ’ 


where 


w^  >  0 
x,y  - 


and 

/I  “  {(s,t);  s,t  “  -1,0,1;  (s,t)  ^  (0,0)} 


(3-6) 


» 


t 


r 


55 

and //‘and  p  are  given  by  (3-3)  and  (3-4).  The  above  simply  means  that 
for  class  we  search  the  direction  of  the  nearest  neighbors  of 
Figure  3-3  which  minimizes  the  index  B  . .  Note  that  for  a  pixel  clas- 
sified  as  having  rapid  motion  we  have  used  a  2-D  causal  spatial  predictor 
based  on  the  separable  first  order  Markov  covariance  model. 

3.2.3  Subsampling  -  The  fact  that  in  moving  areas  spatial 
resolution  can  be  traded  off  for  temporal  resolution  [57]  could  be 
utilized  to  achieve  more  compression  by  subsampling  the  images  in  the 
moving  area  in  conjunction  with  the  buffer  contents.  The  intensity  of 
the  subsampled  pixels  is  obtained  by  linear  interpolation. 

3.2.4  Quantization  and  Coding  -  The  prediction  errors  for  dif¬ 
ferent  classes  are  quantized  using  different  quantizers.  In  order  to 
achieve  a  rate  very  close  to  the  entropy  of  the  quantizer  output  symbols, 
a  group  of  quantizer  symbols  of  fixed  length  is  coded  at  a  time  and  a 
binary  code  is  generated  using  the  Huffman  coding  algorithm  [1]. 

3.2.5  Buffer  Length  Control  -  To  limit  the  buffer  requirement  to 
a  reasonable  size,  the  quantizer  levels  are  changed  as  a  function  of  the 
numbers  of  bits  residing  in  the  buffer.  The  levels  of  the  quantizers 
are  changed  so  as  to  decrease  the  entropy  of  the  output  symbols  as  the 
buffer  contents  Increase  and  vice  versa.  On  the  average  the  entropy  of 
each  of  the  quantizers  is  matched  to  that  of  the  desired  transmission 
rate  (or  compression  ratio) .  To  prevent  buffer  overflow  or  underflow, 

we  use  the  same  technique  as  used  in  [13]  and  also  described  in  section  3.1. 

3.2.6  Simulation  Parameters  -  We  have  simulated  the  above  scheme 
to  achieve  a  compression  ratio  (C»R*)  of  16  or  a  bit-rate  of  .5  bit/pixel 


t' 


i 


; 
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for  the  Head  and  Shoulders  data.  The  buffer  length  was  chosen  to  store  one 

coded  frame  at  this  rate,  i.e.,  .5  x  256  x  256  bits.  The  weights  of 

(3-2)  and  (3-6)  were  all  selected  to  be  unity  except  that  w_  was  chosen 

u ,  — p 

to  be  two.  This  was  done  since  this  is  the  only  pixel  corresponding  to 
the  current  line  and  is  known  to  be  not  Interpolated.  Of  the  rest  three 
pixels, which  belong  to  the  previous  line, as  many  as  two  could  be  inter¬ 
polated  pixels,  which  have  higher  errors. 

The  value  of  the  classification  thresholds  were  determined  experi¬ 
mentally  to  minimize  the  prediction  error  (for  each  class)  and  were  found 
to  be  =  10,  L2  =  14,  =  50  and  -  70.  However,  they  could  also  be 

found  by  finding  the  expected  value  of  A.  .  .  for  transition  from  one 
class  to  another.  This  would  require  the  knowledge  of  covariance  function 
of  the  image  and  the  probability  distributions  of  the  Interframe  difference 
signal,  the  Interframe  motion  and  the  quantizer  noise. 

To  exchange  temporal  and  spatial  resolution,  a  2:1  subsampling  was 
done  for  classes  c^^  and  c^^.  For  simplicity,  the  values  of  and  Pj  in 
(3-5)  were  chosen  to  be  unity.  The  buffer  control  levels  are  shown  in 
Figure  3-5. 

The  input  and  output  levels  of  various  quantizers  are  shown  in 

Table  3-1.  We  have  chosen  a  set  of  alphabets  C=  {cj^,C2,C2}  for  the 

output  of  the  quantizer  Q  .  As  we  are  doing  a  2:1  subsampling  for 

o 

classes  Cj^  and  c^^,  we  have  decided  to  choose  the  symbols  for  and  Qj^ 
from  (T X  ^  to  simplify  the  design  of  the  binary  encoder.  Wherever  there 
are  more  than  one  symbols  available  for  an  output,  only  one  is  sent  at  a 
time  and  each  takes  its  turn  in  a  fixed  order,  e.g.,  the  code 
means  that  the  first  time  Is  transmitted  and  the  next  time  Cj^c^  is 

transmitted.  Then  again  the  cycle  is  repreated.  At  the  decoder,  both  of 
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these  codes  are  decoded  into  the  same  output  level  of  the  quantizer.  The 
entropy  of  the  synbol  set  is  .48  bits.  We  code  a  group  of  four  alphabets 
at  a  time  using  the  Huffman  code  [1]  and  achieve  an  average  rate  of  .52  bit/ 
alphabet,  which  is  very  close  to  the  entropy  and  also  our  desired  bit 
rate  of  .5  bit/pixel,  as  each  pixel  generates  one  alphabet. 

3.3  Results  and  Comparisons: 

Table  3-2  shows  the  performance  of  the  frame  replenishment  cluster 
coding  scheme  for  the  Head  and  Shoulders  data.  It  is  assumed  that  the 
first  frame  is  available  without  any  distortion  at  the  receiver  as  well 
as  the  transmitter.  The  stationary  area  corresponds  to  those  pixels  of 
a  frame  which  are  classified  as  insignificant  changes  from  the  previous 
frame  and  the  moving  area  corresponds  to  the  significant  changes.  Buffer 
overflow  area  corresponds  to  the  area  of  a  frame  which  is  repetition  of 
the  previous  frame  after  the  buffer  contents  exceed  point  F  in  Fig.  3-1. 

Figures  3-6(a)  and  3-6(b)  show  the  resulting  images  at  bit-rates 
of  .5  blt/plxel  and  1  bit/pixel,  respectively,  assuming  no  transmission 
channel  errors.  As  is  evident  from  Table  3-2,  the  scheme  performs 
poorly  at  .5  bit/plxel  since  about  60%  of  the  time  the  contents  of 
the  previous  frame  are  repeated.  Thus,  the  motion  would  be  reproduced 
with  jerkiness.  The  temporal  lag  is  evident  from  Fig.  3-6(a)  where  the 
areas  of  the  image  correspond  to  different  frames  in  the  original  data. 

The  performance  of  the  adaptive  classification  prediction  scheme 
is  shown  in  Table  3-3  and  Figure  3-6(c).  Although  the  SNR  is  much  better 
than  the  cluster  coding  scheme,  there  is  visibly  poor  spatial  resolution 
in  the  moving  areas. 
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TABLE  3-2 

FRAME  REPLENISHMENT  CLUSTER  CODING  RESULTS  FOR  HEAD  AND  SHOULDERS  DATA. 


IMAGE  AREA 

.5  Bits/Pixel 

1  Bit/Pixel 

%  OF 

TOTAL  AREA 

S/N 

%  OF 

TOTAL  AREA 

S/N 

Stationary 

29.40 

38.6  dB 

■ 

68.66 

39.09  dB 

Moving 

11.38 

31.19  dB 

23.61 

30.14  dB 

Buf  f er-Over f low 

58.52 

25.90  dB 

6.61 

25.48  dB 

TOTAL 

100.00 

_ 

27.88  dB 

100.00 

33.0  dB 

TABLE  3-3 

ADAPTIVE  CLASSIFICATION  PREDICTION  CODING  OF  HEAD  AND  SHOULDER  DATA, 

BIT  RATE  -  0.5. 


IMAGE  AREA 

%  OF 

TOTAL  AREA 

S/N 

Stationary 

55.05 

39.09  dB 

Slow  Motion 

37.15 

33.97  dB 

Rapid  Motion 

7.76 

27.77  dB 

TOTAL 

100.00 

34.75  dB 

t 
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The  schemes  described  above  have  the  advantage  that  they  require 
•  a  very  low  storage  capacity  for  the  Image  data,  about  1  Image  line,  as 

well  as  have  low  computational  complexity. 

I 

t 

t 

I 

» 

I 
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CHAPTER  IV 

TRANSFORM  CODING  TECHNIQUES 


The  theory  of  transform  coding  of  images  can  be  found  in  [28,31,40, 
58,77].  For  a  brief  description  see  section  1.3. 

Figure  4-1  shows  a  block  diagram  of  a  simple  3-D  transform  coding 
scheme  for  multiframe  images.  For  practical  reasons  of  data  manipulation 
and  management  (in  hardware  or  software),  the  3-D  data  array  is  first 
divided  into  smaller  arrays  called  sub-blocks.  Let  U  be  one  such  array 
of  size  L  X  M  X  N  and  let  V  denote  its  transform.  Since  for  a  sub-block 
size  of  L  X  M  X  N,  L  image  frames  need  to  be  stored,  the  data  array  con¬ 
taining  L  frames  will  be  referred  to  as  a  block.  Hence  further  division 
of  this  data  has  been  referred  to  here  as  a  sub-block.  The  three  dimensional 
discrete  unitary  linear  transforms  that  we  consider  are  separable  in 
three  dimensions,  analogous  to  the  three  dimensional  Fourier  transform  of 
a  continuous  function  f(x,y,z),  viz.. 


00  00  00 


F(Wj^,a)2,0J3)  = 


f  f{x,y,z)e-j^“>^'^2y^^"^dx  dy  dz 


^OO 


00  I  00 


=]'  1 


—03  —00 


-(0,  X  , 


f(x,y,z)e  1  dx 


e~^^^dy>e~^^ ^dz  . 


For  the  discrete  array  U,  {u(k,l,j);  l^k£L,  l5.i£M,  i£j£N}, 
the  analogous  3-D  transformation  is 
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L  M  N 
k=l  i=l  j=l 

1  <  Z  <  L,  1  <  m  <  M,  1  <  n  <  N  (4-la) 

where  f.,  is  an  M  x  M  transform  matrix  for  an  M  x  i  vector.  Because  of 
the  separability  of  this  transformation  in  each  dimension,  (4-la)  can  be 
written  as  a  sequence  of  three  one  dimensional  transformations 
L 

V,  (X,,i,j)  =  I  u(k,i,j)’f'(£,k);  l^ilM,  llilN, 
k=l  ^ 

M 

V2(^.m.j)  =  I  v^(Jl,l,j)H'^(m,i);  1<!L<L,  l<j<N,  l<m<M 
i=l 

(4-lb) 

N 

v.,(J,,m,n)  =  'l  V. (J,,m,j)’f  (n,j);  1<£.<L,  l<m£M,  l£n<N 
j=l 

v(J!,,m,n)  =  V2(Ji,m,n)  . 

For  an  arbitrary  ’F,  the  number  of  operations  would  be  LMN(L+M+N).  If 
is  a  fast  transform,  such  as  the  fast  Fourier  transform  (FFT)  [8]  ,  then 
the  operation  count  is  reduced  to  the  order  of  LMN* log2 (LMN) . 

Each  sample  of  the  transform  array,  called  transform  coefficient, 
is  generally  quantized  independently  by  a  zero  memory  quantizer.  The 
overall  coder  efficiency  is  maximized  (with  respect  to  the  mean  square 
error  criterion)  when  the  transform  coefficients  are  uncorrelated  (which 
is  a  property  of  the  optimum  Karhunen-Loeve  transform  [40,76]).  The  quan¬ 
tizer  design  depends  on  the  probability  distribution  of  the  transformed 
samples.  Experimentally,  for  the  Cosine  transform,  the  samples  v(*-,m,n) 
have  been  modeled  quite  well  for  image  data  by  the  Laplacian  density 
model,  Roese  [67], 
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”  ■  L®  I  ^ 

^  m 


1  E[{v*(J!',in,n) 

n 


If  we  define  the  quantizer  characteristics  as 

q(x)  =  mean  square  quantizer  error  for  a  unit  (4-5) 

variance  input  random  variable  quantized 
to  X  bits 


we  can  write 


°  LMN  ^  ^ 
£  m 


a  («,,m,n)q(b.  ), 

V  J6,m,n 


(4-6) 


whtire  bn  =  number  of  bits  allocated  to  the  coefficient  v(Jl,m,n). 

X/  y  HI)  Ti 

Now,  if  the  total  number  of  bits  available  is  fixed,  i.e.. 


Illb 

il  m  n 


)l,m,n 


LMNb 


(4-7) 


where  b  =  average  bit  rate  in  bits  per  sample,  the  overall  distortion  is 
minimized  by  finding  the  optimal  bit  allocation  among  the  various  samples 
such  that  the  distortion  D  given  by  (4-6)  is  minimized.  Since 


b„  >  0 
)i.,m,n  — 


(4-8) 


(4-6)  is  to  be  minimized  subject  to  the  constraints  of  (4-7)  and  (4-8). 
Another  desirable  constraint  is  to  require 


a  ,m,n 


=  integer 


(4-9) 


The  above  minimization  can  be  performed  by  a  simple  integer  programming 
algorithm,  originally  due  to  Fox  [89].  Jain  and  Wang  [86]  have  applied 
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this  algorithm  for  finding  integer  bit  allocation  in  hybrid  coding  of 

intraframe  images  for  several  practical  quantizers.  (For  other  methods 

of  approximate  bit  allocation  see  [28,70]).  Once  the  bit  allocations  are 

known,  each  transform  sample  v(2.,m,n)  is  quantized  to  b„  bits. 

X/  ^  in  y  Ti 

As  we  can  see  from  the  above  analysis,  a  transform  coder  design 

2 

requires  the  knowledge  of  the  transform  coefficient  variances,  a^(<l,m,n). 

For  a  simple  transform  coding  scheme  with  a  fixed  quantizer  for  each  of 

the  transform  coefficients,  the  multiframe  images  are  assumed  to  be  wide 

sense  stationary  (although,  as  pointed  out  in  chapter  II,  this  is  a  very 

poor  approximation  for  the  motion  Images).  With  this  assumption, it  can 

2 

be  easily  seen  that  0^(2., m,n)  can  be  obtained  by  the  knowledge  of  the  co- 
variance  function  of  the  array  U  (see  [49,67]).  One  approach  has  been  to 
model  the  covariance  by  some  simple  function,  e.g.,  as  a  product  of  first 
order  stationary  Markov  process  covariance  models  [67]  defined  by 


m  n  “  E[u(k’ ,i' ,j ')  •u(k'+2,,i’+m,j '+n)  ] 


|2-|  Im|  Ini 

o  Pj^  Pj 


(4-lOa) 
(4- 10b) 


2 

where  a  is  the  variance  of  the  data  sample  u(k,i,j)  and  the  and  p 

are  the  one  step  correlation  parameters  along  the  indices  k,  i,  and  j, 
respectively.  An  alternative  approach  is  to  measure  the  covariance  func¬ 
tion  on  a  portion  of  the  data  [49]  similar  to  the  intraframe  case  as  in 

(4-5)  and  use  these  for  the  rest  of  the  data.  However,  the  transform 

2 

domain  statistics,  l.e.,  cr^(2,ra,n),  can  be  directly  estimated  from  the 
transform  coefficients.  In  this  case  the  estimate  Is  given  by 


0 


2 

V 


(2,,m,n) 


1  0 

'l  v^(Jl,m,n)  l<il£L,  l<m<M, 
o  V 


<  n  <  N,  (4-Ll) 
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where  is  the  number  of  data  sub-blocks  used  in  the  calculation  of 
(4-11)  and  the  summation  is  taken  over  all  these  sub-blocks. 

The  array  of  quantized  transform  coefficients,  V  ,  is  encoded  and 
transmitted  or  stored.  Its  received  value,  V*,  is  inverse  transformed 
(by  interchanging  U  and  V  and  replacing  i  with  in  (4-1))  to  obtain  the 
reproduced  value  of  U  as  U*.  In  the  coding  experiments  of  this  chapter  and 
the  next  chapter  we  assume  the  channel  to  be  noise  free,  i.e., 

V*  =  V*  =5>  U*  =  U*  . 

For  the  discussion  on  coding  for  noisy  channels,  refer  to  chapter  VI. 

4.1  Adaptive  Interframe  Transform  Coding  Schemes 

In  the  transform  coding  scheme  discussed  above,  the  multiframe 
images  were  modeled  as  a  3-D  stationary  random  process  (in  the  wide  sense). 
In  reality,  the  multiframe  motion  images  are  nonstationary,  in  general. 

The  nonstationarity  exists  in  the  spatial  as  well  as  the  temporal  dimen¬ 
sions,  and  the  latter  appears  to  be  more  severe  between  the  two. 

This  is  because  the  temporal  direction  is  deterministically  related 
(except  for  the  noise  due  to  camera  &  digitization)  to  the  spatial  coor¬ 
dinates  as  discussed  in  chapter  II.  Hence,  a  stationary  random  process 
characterization  is  not  a  realistic  assumption.  The  nonstationarity  in 
the  spatial  domain  is  mainly  due  to  the  presence  of  sharp  object  edges 
(or  features)  within  an  image  frame. 

The  encoding  of  multifrarae  motion  Images  with  the  assumption  of 
stationarity,  therefore,  results  in  large  degradations  in  the  sharp 
features  within  a  frame,  and  in  the  reproduction  of  motion  features. 

Usually,  these  are  the  more  desirable  features. 
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Some  researchers  have  developed  methods  of  coding  single  frame 

*  Images  by  separating  the  nonstatlonaritles  (such  as  edges)  and  coding 

them  separately  [75,83].  The  residual  Image  (after  separating  or  sub¬ 
tracting  edges)  is  then  modeled  accurately  by  a  stationary  process  and 
can  be  coded  using  a  simple  transform  coding  scheme  such  as  described 
above  (the  intraframe  or  2-D  transform  coding  is  a  special  case  of  the 
interframe  scheme  with  L  =  1).  However,  these  methods  result  in  increased 
complexity  and  their  extension  to  the  interframe  (or  3-D)  transform  cod¬ 
ing  seems  difficult. 

We  have  investigated  the  possibility  of  some  simple  extensions  of 
transform  coding  which  would  improve  its  performance  by  accounting, 
in  some  way,  for  the  nonstationarity .  We  have  found  that  the  concept 
of  "activity  index"  proposed  by  Gimlet  [21]  for  intraframe  transform 
coding  can  be  extended  to  the  interframe  coding  by  finding  a  modified 
activity  index.  As  we  have  seen  in  chapter  II,  the  inter frame  variance 
(or  IFV)  is  a  good  measure  of  the  combined  spatial  and  temporal  activity 

p 

'■  between  two  successive  frames  of  multiframe  motion  images.  Thus,  an 

average  of  the  IFV  measured  over  a  sub-block  between  each  pair  of  succes¬ 
sive  frames,  given  by 

t 

a  =  (L-l)MN  III  {u(k,l,j)  -  u(k-l,i,j)}^  f4-12) 

^  ^  k=2  i=l  j=l 

^  could  be  used  as  a  good  measure  for  the  activity  index. 

In  [21]  the  adaptation  is  achieved  by  classifying  a  two  dimensional 
sub-block  into  one  of  the  4  classes,  based  on  the  value  of  the  activity 

f  index  (which  is  nothing  but  the  variance  of  the  sub-block),  by  a  threshold 

classifier.  The  thresholds  are  chosen  such  that  each  class  has  equal 

f 
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occurrences  in  an  image  frame.  Each  class  is  assigned  different  but 
fixed  number  of  total  bits  (obviously  a  class  having  higher  activity 
index  is  assigned  more  bits).  Thus  this  scheme  operates  at  fixed  bit- 
rate  per  frame.  We  would  like  to  point  out  that  what  is  essentially 
being  achieved  by  this  adaptation  is  to  approximate  a  nonstationary  pro¬ 
cess  by  4  piecewise  stationary  processes.  Also  note  that  for  each  class 
the  thresholds  would  vary  from  one  image  frame  to  another  with  the  distri¬ 
bution  of  motion.  Thus,  making  the  piecewise  stationary  approximation 
poorer  because,  for  the  same  class,  the  range  of  activity  index  is  no 
longer  fixed.  Therefore,  we  have  chosen  fixed  thresholds  for  classification. 

We  also  choose  4  classes.  The  selection  of  number  of  classes  is  a 
trade-off  between  performance  and  complexity.  The  classification  for 
each  sub-block  is  coded  using  2  bits.  For  each  class  separate  bit-rates 
and  statistics  are  used.  Once  the  statistics  for  each  class  are  known  (or 
measured)  the  bit-rates  could  be  determined  from  the  distortion-rate 
curves  by  fixing  the  distortion  level  for  each  class.  The  activity  index 
(or  IFV)  thresholds  for  classification  depend  on  the  nature  of  the  data 
and  the  sub-block  size.  Their  suitable  values  can  be  found  from  the  histo¬ 
gram  of  the  activity  index. 

In  the  adaptive  shceme  described  above,  both  the  bit-rate  and  sta¬ 
tistics  were  adapted  for  each  class.  However,  if  desired,  one  of  them 
could  be  kept  constant  at  the  cost  of  only  a  partial  Improvement  over  the 
usual  (or  non-adaptlve)  scheme. 

4.2  Experimental  Results 

The  adaptive  and  non-adaptive  interframe  transform  coding  mehods 
of  the  previous  sections  as  well  as  the  usual  (or  non-adaptive)  Intraframe 
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transform  coding  were  applied  to  some  of  the  data  sets  described  in 
chapter  I. 

Due  to  the  superior  performance  of  the  Cosine  transform  for  data 
compression  of  highly  correlated  data  (see  appendix  B  and  [33,49,67])  we 
have  chosen  the  transform  matrix  T  throughout  this  work  as  the  discrete 
Cosine  transform  (or  DCT)  matrix;  (see  Ahmed  et  al.  [2])  defined  as 


yi.j) 


/T7m 

/27m  cos 


i=l,  l<j<M 

.  (4-13) 

2_<i<M,  l<j  <M 


4.2.1  Head  and  Shoulders  Images  -  All  the  three  methods  are  com¬ 
pared  for  this  data  set.  Since  it  contains  motion  Images  with  motion 
being  localized  in  certain  areas  of  an  image  frame,  it  is  a  good  candidate 
for  comparing  the  effect  of  adaptations.  For  interframe  (or  3-D  trans¬ 
form)  coding  a  sub-block  size  of  16  x  16  x  16  was  chosen  as  in  [67].  To 
compare  the  performance  of  interframe  and  intraframe  schemes,  two  sub¬ 
block  sizes  for  Intraframe  transform  coding  were  chosen.  The  first, 

16  X  16,  is  used  to  compare  the  contribution  of  the  temporal  redundancy 
exploited  by  the  Interframe  scheme.  The  second  size,  64  x  64,  is  used  to 
have  the  same  number  of  samples  in  the  Intraframe  and  interframe  sub-block 
Since  the  performance  of  a  scheme  also  depends  on  the  knowledge  of  the 
statistics,  some  of  the  statistical  models  are  also  compared. 

For  Intraframe  coding  we  compare  three  statistical  models — (i)  Sep¬ 
arable  covariance  model  of  (A-1)  with  =  .95;  (li)  Measured  sta¬ 

tistical  model,  which  is  obtained  by  suppressing  index  Z  in  (4-11),  given 
in  Table  A-1;  and  (ill)  Isotropic  covariance  model  with  correction  given 
in  Table  A-4  (see  appendix  A  for  discussion  on  modeling  intraframe 
statistics) . 


Table  4-1  gives  the  performance  of  these  models  for  a  sub-block 
size  of  16  X  16.  As  expected,  the  performance  of  the  separable  model  is 
the  worst  and  that  of  the  measured  statistical  model  is  the  best. 

The  Isotropic  model  with  correction  is  closer  to  the  measured  statistical 
model  at  low  bit- rates  and  in  between  at  higher  bit- rates.  Also,  the 
superior  performance  of  the  measured  statistical  model  Increases  with 
the  bit-rate.  This  is  also  expected,  since  at  lower  bit-rates  only 
the  low  order  (or  low  spatial  frequency)  transform  coefficients  are 
transmitted  (from  the  distortion-rate  considerations)  and  usually  the 
simple  parametric  models  such  as  the  separable  and  the  isotropic  (without 
correction)  do  well  in  predicting  their  statistics  (as  can  be  seen  by 
comparing  Tables  A-1  and  A-3). 

Table  4-2  shows  the  performance  of  the  Intraframe  scheme  for  the 
sub-block  size  of  64  x  64  for  two  of  the  models  (except  the  separable, 
which  is  expected  to  do  relatively  worse  for  higher  sub-block  size) .  We 
notice  an  improvement  between  1-2  dB  for  the  isotropic  model  and  about 
3-4  dB  for  the  measured  statistical  model  over  the  16  x  16  case.  Thus, 
the  relative  superior  performance  of  measured  statistical  model  increases 
with  the  increase  in  the  sub-block  size.  This  again  is  expected. 

Figures  4-2  and  4-3  show  some  of  the  images  corresponding  to  frame 
#8  resulting  from  the  intraframe  transform  coding.  In  general,  the  rela¬ 
tive  visual  quality  for  various  models  and  array  sizes  is  in  agreement  with 
the  mean  square  performance.  At  low  SNR  (below  35  dB) ,  the  noise  in  the 
background  areas  is  quite  visible  in  addition  to  the  blurring  of  sharp 
features  (or  edges).  For  high  SNR  (above  37  dB)  the  visual  quality 
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TABLE  4-1 


SNR  FOR  NON-ADAPTIVE  INTRAFRAME  COSINE  TRANSFORM  CODING  OF  THE  HEAD 
AND  SHOULDERS  IMAGES  FOR  THREE  STATISTICAL  MODELS.  SUB-BLOCK  SIZE  = 
16  X  16. 


s. 

N. 

BIT-RATE 

PER  PIXEL 

STATISTICAL  MODEL 

SEPARABLE 

ISOTROPIC 

WITH  CORRECTION 

MEASURED 

STATISTICS 

1 

.25 

28.02  dB 

29.76  dB 

30.29  dB 

2 

.50 

30.03  dB 

33.41  dB 

34.16  dB 

3 

1.00 

35.47  dB 

37.41  dB 

39.63  dB 

4 

2.00 

40.92  dB 

42.38  dB 

45.52  dB 

TABLE  4-2 

SNR  FOR  NON-ADAPTIVE  INTRAFRAME  COSINE  TRANSFORM  CODING  OF  THE  HEAD 
AND  SHOULDERS  IMAGES  FOR  TWO  STATISTICAL  MODELS.  SUB-BLOCK  SIZE  = 
64  X  64. 


S. 

N. 

BIT-RATE 

PER  PIXEL 

STATISTICAL  MODEL 

ISOTROPIC 

WITH  CORRECTION 

MEASURED 

STATISTICS 

1 

.25 

31.49  dB 

33.29  dB 

2 

.50 

35.31  dB 

38.06  dB 

3 

1.00 

39.27  dB 

43.71  dB 

4 

2.00 

43.44  dB 

48.49  dB 

TABLE  4-3 

SNR  FOR  NON-ADAPTIVE  INTERFRAME  COSINE  TRANSFORM  CODING  OF  THE  HEAD 
AND  SHOULDERS  IMAGES  FOR  TWO  STATISTICAL  MODELS.  SUB-BLOCK  SIZE  = 
16  X  16  X  16. 


S. 

N. 

BIT-RATE 

PER  PIXEL 

STATISTICAL  MODEL 

SEPARABLE 

MEASURED 

STATISTICS 

1 

.10 

28.53  dB 

30.72  dB 

2 

.25 

31.31  dB 

34.35  dB 

3 

.50 

33.34  dB 

37.60  dB 

4 

1.00 

35.49  dB 

41.89  dB 

Table  4-3  shows  the  performance  of  the  non-adaptive  interframe 

transform  coding  method  for  the  separable  model  of  (4-lOb)  with 

=  Pj^  =  .95,  and  the  measured  statistical  model  of  (4-11).  Tables  4-4  and 
4-5  show  the  bit  allocation  for  these  two  models.  On  comparing  these  two 

tables  we  notice  that  the  separable  model  wastes  a  large  number  of  bits 

on  high  spatial  frequencies  which  contain  negligible  mean  square  energy. 
Thus,  it  is  a  poor  model  (as  pointed  out  earlier)  in  predicting  the  variance 
of  high  spatial  frequencies. 

On  comparing  Tables  4-1  and  4-3  we  note  that  for  the  separable 
models,  the  gains  due  to  temporal  redundancy  are  only  realized  at  low  bit- 
rates  (once  again,  for  the  same  reason  as  in  the  Intraframe) ,  and  at  1  bit/ 
pixel  there  are  no  practical  gains.  While  for  the  measured  statistical 
models,  there  are  gains  of  2-4  dB  arising  from  temporal  redundancy,  the 
gains  decreasing  with  increasing  bit- rates. 

The  comparison  of  Tables  4-2  and  4-3  show  that,  for  the  measured 
statistical  models,  the  gains  achieved  by  the  exploitation  of  the  tem¬ 
poral  redundancy  can  be  surpassed  by  an  intraframe  scheme  by  simply  in¬ 
creasing  its  sub-block  size  so  that  the  total  sub-block  sizes  of  the 
interframe  and  the  Intraframe  schemes  are  the  same.  This  result,  which 
appears  unexpected  at  first,  is  because  we  have  modeled  the  temporal 
statistics  by  stationary  processes — which  is  a  poor  representation  in 
areas  of  moderate  and  large  motion. 

One  quantity,  to  which  the  relative  performance  of  the  Intraframe 
and  the  Interframe  schemes  for  motion  images  is  definitely  related,  is 
the  amount  of  motion  between  successive  frames.  A  lower  value  of  this 
quantity  (resulting  in  high  temporal  correlation)  will  favor  the  Interframe 


scheme  if  all  the  frames  are  required  to  be  transmitted,  however, 
cations  due  to  motion  must  be  made  in  areas  of  significant  motion. 

Since  the  intraframe  scheme  requires  much  less  memory  (M  image 
rows)  than  the  interframe  scheme  (L  frames),  the  above  result  shows  ch 
non-adaptive  interframe  transform  scheme  is  unattractive  from  the  MSh 
point  of  view. 

Parts  (a)  and  (b)  of  Fig.  4-4  show  the  images  corresponding  cc 
frame  #8  for  the  non-adaptive  interframe  transform  coding.  Comparinr 
'-4(a)  with  Figs.  4-2(a)-(i)  and  4-2(b)-(i),  we  note  that  for  compa.  •. 
levels  of  distortions, at  a  low  SNR,  the  distortion  due  to  the  iatrafro'  - 
and  the  interframe  transform  coding  is  differently  distributed.  The 
stationary  areas  are  much  less  noisy  (or  better  reproduced)  in  the  in: 
frame  coding,  while  the  moving  area  edges  are  more  blurred.  This  res  : . 
is  expected.  Thus,  from  the  point  of  view  of  the  exchange  of  spatial 
cemporal  resolution  for  the  motion  images,  the  Interframe  transform  coc- 
net.iod  might  be  more  desirable  for  the  same  mean  square  error.  This  s'r.") 
the  belief  that  the  MSB  alone  is  not  a  sufficient  criterion  in  comparir. 
various  schemes.  However,  at  high  SNR  values  the  MSB  criterion  seems  i 
reasonable  for  comparisons.  Figure  4-4 (b)  shows  a  significant  improverr. 
due  to  measured  statistics  over  the  separable  model. 

Table  4-6  gives  the  parameters  of  the  adaptive  interframe  r fa- 
coding  scheme.  Table  4-7  gives  the  performance  of  an  adaptive  interr, 
transform  coding  scheme  without  adapting  the  statistics  to  each  class. 
This  was  done  to  separately  study  the  effects  of  the  adaptations  of 
bit-rates  and  the  statistics  to  the  classification.  For  the  first  eni 
of  this  table  even  the  bit-rates  were  forced  to  be  the  same.  Thus,  it 
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corresponds  to  the  non-adaptive  case.  This  was  done  to  see  the  distribu¬ 
tion  of  the  distortion  among  the  classes.  Table  4-8  gives  the  performance 
of  the  adaptive  scheme  for  adaptive  bit-rates  as  well  as  statistics. 

From  the  first  entry  of  Table  4-7  we  see  that  for  a  non-adaptive 
case,  the  average  distortion  increases  with  the  class  number  (as  expected) 
and  there  is  about  13  dB  difference  between  class  1  (containing  areas  of 
low  spatial  and  temporal  activity)  and  class  4  (containing  areas  of  high 
spatial  and  temporal  activity).  Comparing  this  with  entry  3  of  Table  4-7 
we  see  that,  for  the  same  average  rate, the  adaptation  of  bit-rates  alone 
results  in  great  improvement  in  the  distortion  for  classes  3  and  4  and  in 
an  overall  Increase  of  1.6  dB.  From  Table  4-8  we  see  that  an  additional 
gain  of  2-2.5  dB  is  achieved  by  adapting  the  statistics.  Thus  the  overall 
improvement  for  the  adaptive  scheme  over  the  non-adaptive  (Interframe) 
scheme  is  about  4  dB  or  a  compression  gain  by  a  factor  of  about  2  in  addi¬ 
tion  to  the  better  reproduction  of  the  high  spatial  activity  areas  and 
the  motion. 

Figure  4-4 (c)  and  4-4 (d)  show  some  images  for  the  adaptive  inter¬ 
frame  transform  coding.  We  can  see  that  the  adaptive  scheme  does  far 
better  than  the  non-adaptive  scheme.  The  performance  of  the  adaptive 
scheme  at  .1  bit/pixel  is  superior  to  the  non-adaptive  scheme  with  the 
separable  model  at  .5  bit/plxel.  This  is  evident  by  comparing  Images 
(a)  and  (c)  of  Figure  4-4,  where  the  former  reproduces  motion  much  better 
(see  the  lips,  the  eyes,  and  the  tie).  Thus,  at  low  SNR  we  obtain  a  com¬ 
pression  gain  of  5  by  the  adaptive  scheme  over  the  non-adaptive  scheme 
with  the  separable  model  and  still  get  better  results. 

4.2.2  Chemical  Plant  Images  -  Since  the  Chemical  Plant  images  were 
generated  by  an  alrborrr>  camera,  the  motion  is  moe  evenly  distributed. 
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Since  any  adaptation  without  motion  compensation  is  not  expecr^^^d 
result  in  significant  improvement,  only  non-adaptive  schemes  were 

Table  4-9  and  Figure  4-5  show  the  results  for  the  measured 
tistical  model.  We  notice  that  for  these  Images,  the  signal  ro  ■ 
ratios  obtained  are  much  lower  than  those  for  the  Head  and  SlcjuI  . 
images.  This  is  because  these  images  have  much  lower  correlat lor. 
the  data  is  more  noisy.  Comparing  the  relative  performances  >'T  .. 
once  again,  we  note  that  the  intraframe  scheme  at  sub-block  si,/: 
t.4  X  64  is  almost  as  good  as  the  Interframe  scheme  for  the  c  .c  i: 
of  16  X  16  X  16. 

4.2.3  X-Ray  Projection  Images  -  The  projection  images  ■  ' 
xperiment  are  the  2-D  x-ray  projections  of  a  3-D  object  ar  v.- 
ircund  a  fixed  axis  and  do  not  contain  motion.  The  stationari  , 

. Istics  is  a  more  valid  assumption  for  these  images  and  thus  the 
better  candidates  for  Interframe  transform  coding.  The  given  im 
'  ery  high  correlation  between  the  rows  (along  i-axis) .  So  a 
sv^e  of  8  X  32  X  16  was  selected. 

Table  4-10  shows  the  performance  of  the  non-adaptive  trans 
i.ng  method  with  measured  statistics.  Figure  4-6  shows  an  origin 
snd  some  of  the  coded  Images  at  various  compression  ratios.  In 
tic  mean  square  error  is  plotted  as  a  function  of  frame  (or  im., 
The  periodic  occurrence  of  the  error  peaks  after  every  eigiith  i 
due  to  the  fact  that  these  frames  lie  on  the  boundaries  of  our  8 
•iib-blocks.  However,  this  effect  diminishes  for  lower  compressi ^ 
I  .g.,  an  almost  constant  mean  square  error  at  the  compression  ra. 
ir;  achieved. 
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TABLE  4-9 


SNR  FOR  NON-ADAPTIVE  INTRAFRAME  AND  INTERFRAME  COSINE  TRANSFORM  CODING 
OF  THE  CHEMICAL  PLANT  IMAGES  WITH  MEASURED  STATISTICS. 


s. 

N. 

BIT-RATE 

PER  PIXEL 

INTRAFRAME 

INTERFRAME  | 

SUB-BLOCK 

SIZE  =  16  X  16 

SUB-BLOCK 

SIZE  =  64  X  64 

SUB-BLOCK 

SIZE  =  16  X  16  X  16 
_ ^ J 

1 

.5 

27.26  dB 

28.51  dB 

28.65  dB  1 

2 

1.0 

30.73  dB 

32.10  dB 

32.16  dB 

3 

2.0 

36.37  dB 

38.14  dB 

37.97  dB 

TABLE  4-  10 


PERFORMANCE  OF  THE  NON-ADAPTIVE  INTERFRAME  TRANSFORM  CODER  FOR  THE  X-RAY 
PROJECTION  IMAGES. 


;.N. 

BIT- RATE 
PER  PIXEL 

COMPRESSION 

RATIO 

^  MEAN  SQUARE 

I  ERROR 

SIGNAL  TO 
NOISE  RATIO 

1 

.04 

200 

1  5.522 

40.71  dB 

2 

.125 

64 

1 

2.153 

44.80  dB 

3 

.25 

32 

1.277 

47.07  dB 

4 

.50 

16 

0.757 

49.34  dB 

5 

1.00 

8 

0.373 

52.41  dB 

6 

2.00 

4 

0.121 

57.31  dB 
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SNR  FOR  NON-ADAPTIVE  INTRAFRAME  AND  INTERFRAME  COSINE  TRANSFORM  CODING 
OF  THE  CHEMICAL  PLANT  IMAGES  WITH  MEASURED  STATISTICS- 


S. 

N, 

BIT-RATE 

PER  PIXEL 

INTRAFRAME 

— 

INTERFRAME 

SUB- BLOCK 
SIZE  =  16  >:  16 

SUB-BLOCK 

SIZE  =  64  X  64 

SUB-BLOCK 

SIZE  =16  X  16  X  16 

1 

.5 

27.26  dB 

28.51  dB 

28.65  dB  ; 

2 

1.0 

30.73  dB 

32.10  dB 

32.16  dB 

3 

2.0 

_ 

36.37  dB 

38.14  dB 

37.97  dB  ; 
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PERFORMANCE  OF  THE  NON-ADAPTIVE  INTERFRAME  TRANSFORM  CODER  FOR  THE  X-RAY 
PROJECTION  IMAGES. 


S.N. 

BIT-RATE 
PER  PIXEL 

COMPRESSION 

RATIO 

MEAN  SQUARE 
ERROR 

SIGNAL  TO 
NOISE  RATIO 

1 

.04  ■  ... 

200 

5.522 

40.71  dB 

2 

.125 

"'''--..64 

2.153 

44.80  dB 

3 

.25 

32 

1.277 

47.07  dB 

4 

.50 

16 

'i^.-Z57 

49.34  dB 

5 

1.00 

8 

0.373 

52.41  dB 

6 

2.00 

4 

0.121 

57.31  dB 
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Intraframe  and  Interframe  Transform  Coding  using  Measured  Statistics  In  Transform  Domain. 


(c)  C.R.  =  64 


(d)  C.R.  =  16 


Images  resulting  from  data  compression  of  a  projection  image  at  angle  of 
view  =  0°. 


Figure  4-6 


F-RnilL  NO - - 

Figure  4-7:  Variation  of  Mean  Square  Error  as  a  Functir.. 

•  Number  for  Transform  Coding  of  the  X-ray  Proj°cti' 


Images . 
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In  the  foregoing  experiments  we  have  encoded  the  projection  data 
and  shown  the  coder  performance  on  this  data.  However,  in  practice,  the 
medically  useful  information  lies  in  3-D  view  or  equivalently,  in  the  mul¬ 
tiple  adjacent  transaxial  cross-sections  (also  called  levels)  of  the  ob¬ 
ject,  which  are  reconstructed  from  the  projection  data.  The  various 
levels  were  separately  reconstructed  by  approximating  the  x-ray  cone  beam 
by  a  sequence  of  parallel  divergent  fan  beams.  This  is  a  reasonable  ap¬ 
proximation  for  x-ray  cone  beam  sources  far  from  the  object,  and  permits 
use  of  a  two  dimensional  reconstruction  algorithm  for  each  level.  For 
our  data  a  divergent  beam  two-dimensional  reconstruction  algorithm  [27] 
was  used.  Final  reconstructed  Images  for  various  levels  are  of  size 
64  X  64  and  were  displayed  after  a  sample  averaging  of  three  adjacent 
video  lines. 

Figures  4-8(a)-(c)  show  the  reconstructed  images  at  levels  34  and 
94  (of  the  total  128  possible)  reconstructed  from  the  original  as  well  as 
from  the  compressed  projection  data.  Figure  4-8(d)  shows  the  images  of 
the  error  signal  between  the  original  and  the  compressed  reconstructions 
at  various  compression  ratios  for  level  94. 

The  effect  of  the  data  compression  on  resolution  is  readily  observed 
by  viewing  the  reconstructed  images  at  level  94.  Notice  the  blurring  of 
two  dark  small  circular  areas  at  about  the  4  o'clock  position  (small  air 
passages  in  the  lung  called  bronchi)  as  the  compression  ratio  increases. 

The  smaller  of  the  two  areas  (upper  right)  starts  disappearing  at  com¬ 
pression  ratio  of  16,  while  the  other  one  starts  disappearing  at 
32.  Generally  the  larger  features  are  retained  at  even  higher  compression 
ratios.  The  error  images  show  that  at  lower  compression  ratios  the  sample- 
to-sample  errors  are  more  or  less  uncorrelated,  while  at  higher  compression 
Taties.fhe  object  strut  are  is  more  visible  in  the  errors. 


89 


Reconstruction  of 

Level 

#34 

(d)  Normalized  error 

in  reconstruction 

of  Level  #94  due 

to  Data  Compression 

(i)  Original 

(ii) 

C.R. 

=  8 

(i)  C.R.  =  4 

(ii)  C.R.  =  16 

(iii)  C.R.  =  32 

(iv) 

C.R. 

=  200 

(iii)  C.R.  =  64 

(iv)  C.R.  =  200 

(b)  Reconstruction  of  Level  #94  (c)  Reconstruction  of  Level  #94 

(i)  Original  (ii)  C.R.  =  4  (i)  Original  (ii)  C.R.  =  32 

(iii)  C.R.  =  8  (iv)  C.R.  =  16  (iii)  C.R.  =  64  (iv)  C.R.  =  200 

Figure  4-8  :  Reconstruction  Images 
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The  high  values  of  signal  to  noise  ratio  achieved  at  various  cor. 
presslon  levels  demonstrate  that  large  redundancy  Is  present  in  the  pro 
jection  images.  Preliminary  Indications  are  that  compression  ratios  of 
8  to  16  are  realizable  for  those  applications  where  very  high  quality 
(pixel  by  pixel)  reproduction  of  the  reconstructed  images  is  desired. 
This  would  include  applications  where  the  reconstructed  images  are  to 
used  for  detection  and  quantification  of  objects  of  small  size  (e.g., 
holes  in  the  septum  of  the  heart  or  distortion  of  vessels) .  In  other 
applications,  where  the  medically  useful  information  lies  in  the  size, 
location  and  the  boundaries  of  larger  objects  (e.g.,  motion  of  heart 
muscle  mass,  etc.),  larger  compression  ratios,  64  to  200,  may  be  accept 
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CHAPTER  V 

INTERFRAME  HYBRID  CODING  SCHEMES 


Because  of  the  rapid  changes  in  the  temporal  characteristics  of 
motion  Images,  it  is  desirable  to  have  a  predictive  coding  scheme  along 
the  temporal  axis.  On  the  other  hand,  for  spatial  Information  transform 
coding  is  more  efficient.  Hybrid  coding  utilizes  the  superior  performance 
of  transform  coding  in  the  spatial  domain  and  the  simplicity  of  DPCM  to 
exploit  the  temporal  correlation  with  tremendous  savings  in  the  memory 
(requires  only  a  single  frame  storage  for  a  first  order  DPCM).  Motion 
compensation  methods  of  chapter  II  can  be  successfully  employed  in  this 
design. 

Figure  5-1  shows  a  simple  (or  non-adaptive)  interframe  hybrid 
coding  scheme.  First  we  assume  the  data  to  be  wide  sense  stationary. 

Uj^  is  an  M  X  N  sub-block  of  the  kth  frame.  is  obtained  by  a  2-D 
transformation  of  Uj^,  and  is  defined  similar  to  the  3-D  transform  in 
(4-1)  with  L  =  1.  It  can  also  be  expressed  as 


V  =  Y  U  ¥ 
k  M  k  N 


Each  transform  coefficient  is  independently  coded  by  DPCM  along  the 
temporal  axis  (or  index  k)  via  a  suitable  autoregressive  model  represent¬ 
ing  the  statistical  characteristics  of  the  data  in  the  temporal  direction 
In  order  to  limit  the  storage  to  one  frame,  we  only  consider  first  order 
models.  For  Images  having  piecewise  uniform  motion  from  one  frame  to  the 
next,  a  first  order  model  would  be  reasonable.  Thus,  we  have"^ 


t  For  any  matrix  A  we  denote  A(i,j)  H  a(i,j)  to  be  its  (i,j)th  element. 


Figure  5-2:  An  Adaptive  Interframe  Hybrid  Coding  Scheme. 
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V,  (m,n)  =  a  v,  ,  (m,n)  +  e’ (m,n) ,  la  I  <  1.  1  <  m  <  M,  1  <  n  <  N. 

k  m,n  k-1  k  m.n'  —  —  —  — 

(5-1) 

For  simplification  we  assume 

a  =  a  =  Constant.  (5-2) 

m,n 

Although  the  above  simplified  assumption  is  not  very  realistic 
for  motion  images,  it  does  not  affect  the  coder  performance  too  adversely 
(once  the  assumption  of  stationarl  ''  has  been  made) .  This  is  because  at 
low  bit-rates  only  the  low  order  (or  high  mean  square  energy)  transform 
coefficients  are  transmitted.  For  these  a  constant  value  for  the  predictor 
coefficients  a^  ^  given  by  (5-2)  has  been  found  to  be  adequate.  With 
references  to  Figure  5-1,  the  various  predictive  coding  equations,  for  each 
l£m£N,  l£n£N,  are  given  as  follows. 

Predictor  (at  the  Transmitter) ; 

Vj^(m,n)  =  av*_^(m,n) 

Quantizer: 

Input:  ej^(m,n)  =  Vj^(m,n)  -  Vj^(m,n) 

Output:  e^(m,n) 

Reconstituted  Output  at  the  Receiver; 

v*(m,n)  =  a  v*_^  +  e*(m,n) 


(m,n)  =  Vj^(m,n) 


e^(m,n) 


§ 
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where  e*(m,n)  is  the  received  value  of  ej^(m,n)  at  the  receiver  and  in 


the  absence  of  channel  noise  e*(in,n)  =  e*(m,n). 

For  the  Markov  representation  of  (5-1),  we  have  [31,86] 


a^(m,n)  =  E[e^(m,n)]  =  a^(m,n) (1-a^)  , 


(5-3) 


where 


a^(m,n)  =  E[v^(m,n)] 


and  the  values  of  o^(m,n)  are  calculated  either  from  a  2-D  spatial  domain 
covariance  model  or  by  direct  measurement  as  in  the  previous  chapter. 

(See  appendix  A  for  details.) 

We  assume  that  each  ej^(m,n)  is  Laplacian  in  distribution  and  is 
quantized  by  its  Max  quantizer.  Let  b^  ^  be  the  number  of  bits  required 
to  code  e^(m,n).  With  these  assumptions  it  can  be  easily  shovm  that 
(5-3)  becomes  [75,86] 


a^(m,n)  =  CT^(m,n) • (l-a^)/{l-q(b  )a^} 

e  V  ^  m,n 


(5-5) 


where  q(’)  is  a  quantization  distortion  function  defined  by  (4-5). 

Assuming  noise-free  channel,  the  average  mean  square  distortion 
is  given  by 


1  M  N 
m=l  n=l 

1  9  (1-a^) 

=  >5,  I  I  a  (m,n) - - 2 

ra  n  ^  1  -q(b  ) 

m,n 


(5-6) 


As  in  chapter  IV,  we  assume 


b  ■  Integer  >  0 
m,n  "  — 


(5-7) 
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and 


M 


I 

ra=l 


N 


I  b 

n=l 


m,n 


=  MNb 


(5-8) 


where  b  is  the  average  bit-rate.  The  selection  of  b^  ^  (or  the  bit  allo¬ 
cation)  is  made  such  that  (5-6)  is  minimized.  Once  again,  we  use  the 
integer  bit  allocation  algorithm  of  [86]  to  achieve  this. 

5.1  Adaptive  Interframe  Hybrid  Coding  Scheme: 

The  adaptive  strategy  of  section  ^.1  can  also  be  applied  to  the 
hybrid  coding  method  discussed  above  with  appropriate  modification. 

(In  fact  this  algorithm  was  first  developed  for  hybrid  coding,  and  then 
extended  to  interframe  transform  coding.) 

The  activity  index  of  a  sub-block  is  chosen  to  be  its  interframe 
variance  (IFV)  given  by 


^  ^  (ViD.n)  -  u^_^(m.n)}2  . 

m=l  n=l 

Once  again,  a  sub-block  is  classified  into  one  of  four  classes  by 

choosing  suitable  values  for  the  activity  index  thresholds.  Different 

bit-rates  and  statistics  (prediction  coefficient  a,  and  transform  coeffi- 
2 

cient  variances  a^(m,n))  for  each  class  are  appropriately  selected  or 
measured.  This  results  in  a  variable  bit- rate. 

Figure  5-2  shows  the  schematic  of  Fig.  5-1  with  necessary  modi¬ 
fications  for  the  adaptions. 


5.2  Hybrid  Coding  with  Motion  Compensation 

It  has  been  already  pointed  out  in  chapter  II  and  elsewhere  that  the 
temporal  direction  (for  motion  Images)  primarily  consists  of  a  deterministic 


component  (i.e.,  motion).  It  was  also  shown  in  chapter  II  that  this 

component  can  be  satisfactorily  modeled  by  piecewise  linear  translations, 

and  that  motion  compensation  based  on  this  model  results  in  a  tremendous 

reduction  in  the  interframe  variaice  (hence,  improvement  in  the  temporal 

correlation).  Now  we  consider  how  the  motion  measurement  methods  of 

chapter  II  could  be  incorporated  in  the  hybrid  coding  schemes  discussed 

above.  Let  (?.,  ,  £„)  be  the  motion  coordinates  of  the  sub-block  U,  rela- 
12  k 

tive  to  the  (k-l)th  frame.  Then  the  motion  compensation  is  incorporated 
simply  by  replacing  Uj^  ^  by  given  by 


and  thus  replacing  by 


k-1  M  k-l^M 


The  motion  coordinates  (5,^,  ^2^  coded  together  with  the  other  information. 

The  frame  skipping  and  the  Interpolation  of  skipped  frames  can  be 
incorporated  in  the  schemes  as  described  in  section  2.5. 


5.3  Distortion-Rate  Curves  from  Models  of  Interframe  Motion: 

In  section  2.1,  the  relationship  between  the  temporal  correlation, 
the  distributions  of  the  Interfrarae  motion  uncertainty  (i.e.,  dx,  dy) ,  and 
the  measurement  noise  was  established.  Assuming  a  first  order  Markov 
separable  model  along  the  temporal  dimension  and  a  model  for  intraframe 
covariance,  we  can  thus  calculate  the  distortion-rate  functions  from  the 
model  of  motion  uncertainty. 

Let  dx  and  dy  represent  the  motion  uncertainty  in  pixels/frame 


along  X  and  y  axes,  respectively.  Let  N(y,CT)  and  B(a,b)  denote  the 
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Guassian  density  with  mean  p  and  standard  deviation  a,  and  uniform  density 
in  the  interval  [a,b],  respectively.  Let  us  assume  that  dx  and  dy  are 
identically  distributed.  We  also  assume  that  images  are  noise-free,  i.e., 

=  0  in  (2-6). 

For  the  calculation  of  distortion- rate  functions,  we  assume  the 
intraframe  covariance  model  to  be  the  isotropic  model  of  (A-2)  with 
=  Pj  =  p.  We  choose  two  values  for  parameter  p,  viz.,  p  =  .95  and 
p  =  .90.  The  first  one,  which  we  call  Isotroplc-1,  is  a  good  approximation 
for  the  Head  and  Shoulders  images.  The  second  one,  which  we  call  Isotropic-2, 
is  a  good  approximation  for  the  Qiemical  Plant  images. 

We  have  chosen  two  distributions  for  dx  and  dy,  the  Gaussian  and 
the  uniform.  For  these  distributions,  we  use  the  approximation  of  (2-6) 
to  calculate  the  temporal  correlation  coefficient,  which  is  used  as  the 
prediction  coefficient  a  for  the  hybrid  coding  scheme  desribed  in  this 
chapter.  We  choose  a  sub-block  size  of  16  x  16,  and  (5-6)  for  the  calcu¬ 
lation  of  distortion. 

Figure  5-3  shows  some  distortion-rate  curves  for  various  distri¬ 
butions  of  dx  and  dy  for  unit  variance  data.  The  distributions  corres¬ 
ponding  to  the  higher  variances  of  dx  and  dy  (i.e.,  B(-2,2),  B(-4,4), 

N(0,1))  could  be  assumed  as  reasonable  models  for  coding  without  motion 
compensation  (dx  =  dx),  and  those  with  lower  variances  (i.e.,  B(-.5,.5), 
N(0,.25))  for  coding  with  motion  compensation.  The  curve  for  a  =  0  corres¬ 
ponds  to  the  intraframe  transform  coding. 

Table  5-1  gives  the  rates  for  a  fixed  distortion  for  each  intra- 
frame  covariance  model.  These  are  also  shown  on  Fig.  5-3  by  dotted  lines. 

We  notice  that  in  the  absence  of  motion  compensation  a  hybrid  coding 
scheme  achieves  a  compression  gain  (over  the  Intraframe  transform 


Isotropic-2  model,  for  the  distributions  considered.  Motion  compensation 
results  in  another  additional  compression  gain  (over  hybrid  coding  without 
motion  compensation)  by  a  factor  of  =1.35  for  Isotropic-l  and  *1.5  for 
Isotropic-2 . 


TABLE  5-1 

COMPARISONS  OF  RATES  FOR  INTERFRAME  HYBRID  CODING 
FOR  VARIOUS  DISTRIBUTIONS  OF  MOTION  UNCERTAINTY. 


DISTRIBUTION 

OF  dx.dy 

RATE  (BITS/PIXEL) 

ISOTROPIC-1 

D  =  -16.5  DB 

ISOTROPIC- 2 

D  =  -18  dB 

B(-.5,.5) 

.4073 

r*. 

00 

N(0,.25) 

.4518 

1.00 

j  B(-2,2) 

.5278 

- 

I  B(-4,4) 

- 

1.49 

N(0,1) 

.5754 

1.34 

a  =  0 

1.0 

2.0 

5.4  Experimental  Results : 

Since  the  hybrid  coding  described  above  requires  knowledge  of  the 
Initial  frame,  we  assume  that  the  first  frame  of  the  data  Is  available  as 
initial  condition  without  any  distortion. 

5.4.1  Head  and  Shoulders  Images  -  All  the  hybrid  coding  schemes  - 
non-adaptlve ,  adaptive,  and  adaptive  with  motion  compensation  (with  and 
without  frame  skipping  and  interpolation)  -  were  applied  to  this  data  set. 

A  sub-block  size  of  16  x  16  was  selected.  An  isotropic  covariance  model 
with  correction  as  described  in  appendix  A  has  been  used  for  modeling  the 
Intraframe  statistics. 
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Table  5-2  gives  the  perforrmance  of  the  non-adaptive  hybrid  coding 
scheme  for  which  the  suitable  values  of  the  correlation  parameters  were 
found  to  be  =  .955,  pj  =  .945,  a  =  .80.  Comparing  Table  5-2  with  the 
entries  (for  the  isotropic  model  with  correction)  in  Table  4-1  we  notice 
an  improvement  of  about  2-2.5  dB  at  the  bit-rates  considered,  or,  equi¬ 
valently,  a  compression  gain  by  a  factor  of  about  1.5,  as  a  result  of 
temporal  correlation. 

Figure  5-4  shows  the  histogram  of  the  activity  index.  Although 
the  dynamic  range  of  the  activity  index  is  roughly  0-3000,  the  histogram 
shows  the  distribution  in  the  range  0-200  (in  which  about  90%  of  the  samples 
are  contained)  to  better  utilize  the  range  of  the  histogram.  All  the  sub¬ 
blocks  having  activity  index  above  200  have  been  lumped  in  the  last  column. 
We  see  that  a  great  number  of  sub-blocks  have  a  very  low  activity  index 
(mainly  due  to  stationary  background) .  The  threshold  values  of  the  activity 
index  chosen  are  marked  by  arrows.  Our  experimental  results  Indicate  that 
the  coding  performance  is  not  very  sensitive  to  threshold  selection. 

Tables  5-3  and  5-4  show  the  parameters  and  the  performance  of  the 
4  class  adaptive  hybrid  scheme.  Comparison  of  the  correlation  parameters 
of  table  5-3  with  those  for  the  non-adaptive  hybrid  scheme  confirms  our 
earlier  statement  (in  chapters  II  and  IV)  that  the  classification  based 
on  the  activity  index  (IFV)  divides  the  images  into  classes  of  varying 
spatial  activity  (characterized  by  and  p^)  in  addition  to  the  varying 
temporal  activity.  The  Improvement  due  to  the  adaptations  of  the  bit-rates 
and  the  spatial-temporal  statistics  is  about  4-4.5  dB,  or  equivalently,  an 
additional  compression  gain  (over  the  non-adaptive  hybrid)  by  a  factor  of  2. 

Figure  5-5  shows  the  signal  to  noise  ratio  as  a  function  of  frame 
number  for  the  non-adaptive  and  the  adaptive  hybrid  schemes  and  the  Intra- 


TABLE  5-2 

PERFORMANCE  OF  THE  NON-ADAPTIVE  HYBRID  CODING 
SCHEME  FOR  THE  HEAD  AND  SHOULDERS  DATA. 


s. 

SIGNAL  TO 

N. 

BIT-RATE 

NOISE  RATIO 

■n 

1 

.25 

32.28  dB 

2 

.50 

35.49  dB 

TABLE  5-3 


PARAMETERS  OF  THE  4  CLASS  ADAPTIVE  HYBRID 
SCHEME  FOR  HEAD  AND  SHOULDERS  DATA. 


CLASS 

NO. 

ACTIVITY 

INDEX 

PROa^.BILITY 

OF 

OCCURENCE 

CORRELATION 
PARAMETERS  J 

Pi 

a 

1 

0-20 

.  5768 

CO 

.975 

.98 

2 

20-60 

.1737 

.945 

.94 

.93 

3 

60-200 

.1320 

.92 

.905 

.80 

4 

200- 

.1174 

.86 

.84 

_ 

.40 

TABLE  5-4 


PERFORMANCE  OF  THE  ADAPTIVE  HYBRID  CODING  SCHEME 
FOR  THE  HEAD  AND  SHOULDERS  DATA. 

i 


BIT-RATE 

SIGNAL  TO  NOISE  RATIO  IN  DECIBELS 

S. 

CLASS  # 

OVER- 

CLASS  # 

OVER- 

N. 

1 

2 

3 

4 

ALL 

1 

2 

3 

4 

ALL 

1 

.03 

.10 

.18 

.40 

.114 

34.82 

31.85 

31.01 

30.32 

32.86 

2 

.075 

.30 

.45 

.75 

.25 

37.35 

35.95 

35.65 

35.03 

36.52 

3 

.18 

.70 

.90 

1.25 

.50 

39.80 

39.89 

40.23 

39.70 

39.85 

■f 


Figure  5-4:  Histogram  of  the  Activity  Index  (IFV)  of  Head  and  Shoulders 
Images.  Sub-Block  Size  *  16  ^  16. 
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Figure  5-6:  Bit-Rate  as  a  Function  of  Frame  Number  for  Adaptive  Hybrid 
Coding  of  the  Head  and  Shoulders  Images. 
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frame  transform  coding  scheme  (which  is  a  special  case  of  the  non-adaptive 
hybrid  scheme  T.rhen  a  =  0).  We  make  the  following  observations  and  comments 
from  the  figure.  (i)  The  SNR  is  almost  constant  for  the  intraframe  trans¬ 
form  scheme.  This  is  expected  because  the  spatial  contents  of  the  frames 
are  very  similar .  (ii)  The  SNR  varies  greatly  as  a  function  of  frame  num¬ 
ber  for  the  non-adaptive  hybrid  coding  scheme.  This  variation  is  primarily 
due  to  the  variation  in  the  amount  of  motion  activity.  The  frames  having 
larger  motion  (e.g.,  7  through  10)  have  low  SNR,  which  is  very  close  to 
the  SNR  of  the  intraframe  scheme.  (iii)  For  the  adaptive  hybrid  scheme, 
the  SNR  is  fairly  constant  (as  would  be  expected  from  a  good  adaptive 
scheme).  Thus,  the  adaptations  proposed  are  effective. 

Figure  5-6  shows  the  bit-rate  as  a  function  of  frame  number  ‘^or  the 
adaptive  hybrid  coding.  The  frames  containing  larger  motion  have  higher 
bit-rates  (as  would  be  expected).  However,  the  bit-rate  variation  is  not 
as  rapid  as  would  be  expected  from  a  predictive  coding  scheme  such  as  the 
frame  replenishment  cluster  coding  described  in  section  3.1  (if  the  rate 
is  not  controlled  by  th*"  buffer  fulness). 

Figure  5-7  shows  some  of  the  images  of  frame  #8  resulting  from 
adaptive  and  non-adaptive  hybrid  coding.  Comparing  the  images  of  Figs. 
5-7(a)  (non-adaptive  hybrid)  and  4-2(a)-(i)  (intraframe  transform),  we 
notice  that  the  non-adaptive  hybrid  scheme  reproduces  moving  areas  very 
poorly  as  compared  with  the  intraframe  transform  scheme.  However,  the  sta¬ 
tionary  areas  are  better  reproduced.  Note  that  the  only  difference  be¬ 
tween  the  coding  of  these  two  images  is  the  temporal  prediction  coeffi¬ 
cient  (which  is  .8  for  the  hybrid  scheme  and  0.0  for  the  intraframe 
transform  scheme). 


At  low  bit-rates  (Figure  5-7(b))  we  notice  a  shadow  (or  ghost) 
near  the  sharp  edges  of  moving  areas  for  the  adaptive  hybrid  scheme.  If 
a  smoothing  filter  is  employed  in  the  temporal  direction,  this  distortion 
will  change  to  blurring  of  the  moving  areas  (the  Interframe  transform 
coding  does  that,  as  can  be  seen  from  Figure  4-4(c)).  At  higher  bit-rates 
the  shadow  effect  diminishes  and  the  visual  quality  of  the  adaptive  hybric 
and  adaptive  interframe  transform  schemes  are  comparable.  Thus  from 
complexity  consideration,  the  adaptive  hybrid  coding  scheme  is  more 
attractive. 

On  comparing  the  adaptive  hybrid  coding  scheme  presented  here  with 
the  one  presented  in  [67]  we  note  the  following  -  (i)  the  mean  square  per¬ 
formance  is  slightly  (1-2  dB)  better  for  the  scheme  presented  here  at 
hither  bit-rates  (  ^.5);  (li)  the  scheme  presented  in  [67]  is  computation¬ 
ally  more  complex;  (iii)  the  scheme  presented  in  [67]  has  a  fixed  bit-rate 
(overall  as  well  as  for  each  sub-block) ,  while  the  one  presented  here  has 
a  variable  bit- rate;  (iv)  the  scheme  presented  here  reproduces  moving 
edges  more  accurately  (because  of  the  higher  bit-rate  in  the  areas  con¬ 
taining  moving  edges). 

Now  we  present  the  results  for  the  adaptive  hybrid  coding  with 
motion  compensation.  In  these  experiments  only  frames  5  through  9  are 
used  (to  minimize  computational  costs).  However,  we  have  no  reason  to 
believe  that  the  results  will  be  significantly  different  if  all  the  16 
frames  were  employed.  This  is  because,  with  motion  compensation,  the 
interframe  activity  (measured  by  IFV)  is  almost  a  constant  function  of 
frame  number  (as  was  shown  in  section  2.4).  Thus,  the  distortion  and/or 
the  frame  bit-rate  would  now  be  (as  opposed  to  the  case  of  no  motion 
compensation)  Independent  of  frame  number.  Also,  as  is  evident  from 
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Figure  5-5,  Che  adaptive  hybrid  coding  scheme  achieves  the  steady  state 
very  fast  (right  after  the  first  coded  frame).  For  results  on  motion 
measurements,  frame  skipping,  interpolation,  etc.,  for  these  frames,  see 
section  2.4. 

Tables  5-5  and  5-6  give  the  parameters  of  the  adaptive  hybrid 
scheme  with  motion  compensation  without  and  with  alternate  frame  skippinr, 
respectively.  We  note  the  following  -  (i)  Due  to  change  in  the  distrib  r  o- 
of  the  activity  index  (greatly  reduced),  as  a  result  of  motion  compen¬ 
sation,  the  thresholds  for  classification  have  been  lowered;  (il)  nu-i: 
is  a  great  improvement  in  the  temporal  correlation  as  evidenced  by  the 
values  of  a.  We  would  also  like  to  point  out  that  the  average  temporal 
activity  (measured  by  average  motion  in  pixels/ frame  of  a  class  is  not 
directly  evident  from  the  value  of  a  which  is  very  nearly  the  temporaJ 
correlation  parameter).  Let  h  represent  average  temporal  activity 
after  motion  compensation.  Let  =  p.  Let  the  intraframe  covariance '-^e 

given  by  the  isotropic  model.  Then,  an  approximate  value  of  h  is  giver,  b ; 


or 


h  - 


^.n(a) 

lln(P) 


„  ^n(l-(l-a)} 

S.nil-a-p)> 


h  = 


l-g 

1-p 


Computing  this  quantity  from  the  entries  of  Table  5-5  we  note  that  it  i.s 
very  close  to  .25  for  all  the  four  classes,  which  further  supports  our  con¬ 
clusions  in  chapter  II  that  the  motion  uncertainty,  after  motion  compensa¬ 
tion,  is  fairly  uniformly  distributed  over  various  classes.  Thus,  the 
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TABLE  5-5 


PARAMETERS  OF  THE  4  CLASS  ADAPTIVE  HYBRID  CODING  SCHEME 
WITH  MOTION  COMPENSATION  FOR  HEAD  AND  SHOULDERS  DATA. 


CLASS 

NO. 

ACTIVITY 

INDEX 

PROBABILITY 

OF 

OCCURRENCE 

CORRELATION 

PARAMETERS 

Pi 

Px 

a 

1 

0-10 

.506 

.985 

.98 

.996 

2 

10-20 

.295 

.955 

.945 

.99 

3 

.143 

.91 

.90 

.97 

4 

50- 

.057 

.80 

.78 

.95 

TABLE  5-6 

PARAMETERS  OF  THE  4  CLASS  ADAPTIVE  HYBRID  CODING  SCHEME  WITH 
MOTION  COMPENSATION,  USING  ALTERNATE  FRAME  SKIPPING  AND  INTER¬ 
POLATION,  FOR  HEAD  AND  SHOULDERS  DATA. 


CLASS 

NO. 

ACTIVITY 

INDEX 

PROBABILITY 

OF 

OCCURRENCE 

CORRELATION 

PARAMETERS 

Pi 

Pi 

a 

1 

0-20 

.67 

.98 

.975 

.99 

2 

20-40 

.184 

.95 

.94 

.97 

3 

40-100 

.102 

.91 

.90 

VO 

4 

100- 

.045 

.80 

.78 

.90 

TABLE  5-7 

PERFORMANCE  OF  THE  ADAPTIVE  HYBRID  CODING  WITH  MOTION  COMPENSATION 
FOR  HEAD  AND  SHOULDERS  FRAMES  5  THRU  9.  SKIPPED  FRAMES  ARE  INTER¬ 
POLATED  ALONG  THE  MOTION  TRAJECTORY. 


FRAMES 

SKIPPED? 

AVERAGE 
BIT-RATE 
PER  PIXEL 

SIGNAL  TO 

NOISE  RATIO  IN 

DECIBELS 

CODED 

FRAMES 

INTERPOLATED 

FRAMES 

OVERALL 

NO 

.253 

38.74 

- 

38.74 

YES 

.252 

39.97 

37.58 

38.62 

YES 

.125 

37.60 

36.69 

37.12 
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adaptation  due  to  classification  in  this  scheme  is  used  to  compensate 
for  the  spatial  non-stationarity.  If  the  Images  are  spatially  stationary, 
this  scheme  will  reduce  to  the  non-adaptive  hybrid  coding  scheme  with 
motion  compensation. 

Table  5-7  gives  the  coding  results  for  the  adaptive  hybrid  scheme 
with  motion  compensation  with  and  without  frame  skipping.  At  an  SNR  of 
about  37  dB  a  compression  gain  by  a  factor  of  two  can  be  achieved  over 
the  adaptive  scheme  without  motion  compensation.  Figure  5-8  (parts  (b)- 
(d))  show  the  resulting  images  corresponding  to  frame  8.  Visual  quality 
of  these  images  is  very  good.  The  results  Indicate  that  at  very  low  bit- 
rates  (.125  and  below)  the  adaptive  hybrid  coding  scheme  with  motion 
compensation  and  frame  skipping,  and  interpolation  of  skipped  frames 
along  the  motion  trajectory,  is  very  promising  for  high  quality  fidelity 
encoding  of  motion  Images. 

In  the  experiments  on  coding  with  motion  compensation,  the  direc¬ 
tion  of  minimum  distortion  (DMD)  method  described  in  section  2.3  was  used 
with  mean  square  distortion  criterion  for  the  measurement  of  motion. 

5.4.2  Chemical  Plant  Images:  Due  to  very  low  spatial  correlation 
and  large  temporal  activity,  the  hybrid  schemes  without  motion  compensation 
(non-adaptive  and  adaptive)  are  expected  to  result  in  no  significant  improve¬ 
ment  over  an  intraframe  transform  coding  scheme.  Therefore,  the  adaptive 
hybrid  coding  scheme  with  motion  compensation  was  used.  A  sub-block  size 
of  16  X  16  was  selected. 

The  motion  measurement  was  done  by  the  area  correlation  method  with 
a  Fourier  domain  filtering  given  by  (2-12)  (see  section  2.3  for  details). 

The  value  of  y  =  *5  was  found  to  yield  good  results.  For  this  data  set. 


(a)  No  Frame  Skipping,  Bit-rate  =  .5  blt/plxel,  (b)  No  Frame  Skipping,  Bit-rate  =  .25  bit/pixel 

SNR  =  30.45  dB.  SNR  =  38.51  dB. 


110 


o 

o 

Q) 

cd 

u 

H 

C 

O 

♦H 

4J 

00 

c 

o 


0) 

oo 

(d 

g 


73 

0) 

Cd 

o 

& 

0) 


00 

c: 

•H 

CL 

CL 

•H 

c/3 

i 

Cd 


<C 

o 


Interframe  Adaptive  Hybrid  Coding  with  Motion  Compensation  using  Measured  Statistics. 
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the  performance  of  the  area  correlation  method  and  the  DMD  method  of  motion 
measurement  Is  comparable  (with  DMD  being  slightly  superior). 

The  Intraframe  statistics  were  measured  (in  the  transform  domain) 
for  each  of  the  classes.  Tables  5-8  and  5-9  give  the  parameters  and  the 
performance  of  the  coding  scheme.  Figure  5-8(a)  shows  an  image  correspond¬ 
ing  to  frame  12.  Comparing  these  results  with  those  of  the  intraframe 
transform  scheme  with  the  same  sub-block  size,  we  note  an  Improvement  of 
about  2.5-3  dB.  The  copipresslon  gain  (over  the  intraframe  scheme)  at  a 
distortion  level  of  30  dB  is  by  a  factor  of  about  2,  and  smaller  at  higher 
distortion  levels. 

Thus,  we  see  that  the  gains  due  to  adaptation  and  motion  compen¬ 
sation  are  much  lower  than  those  for  the  Head  and  Shoulders  images.  From 
the  results  of  section  2.5, it  is  evident  that  the  frame  skipping  and  inter¬ 
polation  along  motion  trajectory  can  be  successfully  used  for  these  images 
to  achieve  higher  compression  similar  to  those  for  the  Head  and  Shoulders 
Images . 

5.4.3  Angiocardiogram  Images  -  The  temporal  activity  of  these 
images  exhibit  two  characteristics  of  the  cardiac  cycle  -  (i)  it  is 
periodic;  (ii)  it  is  nonuniforraly  distributed  over  a  period  (cardiac 
cycle).  Also,  at  the  frame  sampling  rate  of  1/30  sec.,  the  Images  have 
high  temporal  correlation. 

Due  to  the  above  properltles,  the  adaptive  hybrid  coding  scheme 
(without  motion  compensation)  was  found  to  be  ideally  suited  for  these 
images.  The  images  were  found  to  be  spatially  stationary.  This  was  ex¬ 
pected,  because  these  images  do  not  exhibit  sharp  features  which  are 
characteristic  of  most  video  Images.  Therefore,  adapting  the  intraframe 
statistics  to  classes  of  different  activity  index  is  not  necessary. 
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TABLE  5-8 

PARAMETERS  OF  THE  4  CLASS  ADAPTIVE  HYBRID  CODING  WITH 
MOTION  COMPENSATION  FOR  THE  CHEMICAL  PLANT  IMAGES. 


PROBABILITY 

TEMPORAL 

ACTIVITY 

OF 

PREDICTION 

index 

OCCURRENCE 

COEFFICIENT  a 

TABLE  5-9 

PERFORMANCE  OF  THE  ADAPTIVE  HYBRID  CODING  WITH  MOTION  COMPENSATION 
FOR  THE  CHEMICAL  PLANT  IMAGES.  SUB-BLOCK  SIZE  =  16  x  16. 


113 


Figure  5-9  shows  the  bit  allocation  at  two  bit-rates.  Bit  allo¬ 
cation  pattern  for  these  images  Is  very  unusual  compared  with  video 
images.  The  data  seems  to  have  some  characteristic  frequencies  (more 
accurately  speaking, the  discrete  Cosine  transform  basis  vectors).  An 
attempt  to  model  the  statistics  by  any  of  the  commonly  used  models  would 
result  in  a  loss  of  these  frequencies,  and  thereby,  a  probable  loss  of 
medically  useful  information  for  the  same  signal  to  noise  ratio. 

Tables  5-10  and  5-11  give  the  parameters  and  the  performance  of  the 
coding  scheme.  Figure  5-10  shows  the  classification  maps  for  two  of  the 
frames.  It  is  interesting  to  note  from  Fig.  5-10  that  the  classification 
scheme  very  closely  follows  the  activity  of  the  cardiac  cycle.  During  a 
stationary  cardiac  frame  period,  the  scheme  uses  less  than  one-third  the 
average  bits/frame  and  during  an  active  period,  about  twice  the  average 
rate. 

Figure  5-11  shows  two  of  the  original  frames  (combined  into  a 
single  image)  and  their  coded  equivalents  at  some  of  the  compression 
ratios.  Even  at  very  low  bit-rate  of  .0625  (or  a  compression  ratio  of 
128)  the  image  quality  looks  fair  (by  evaluation  of  still  frames).  For 
these  images  a  compression  ratio  of  32  to  128  seems  to  be  realizable. 

An  accurate  reproduction  of  n»tion  is  required  for  these  images.  The 
methods  using  the  exchange  of  spatial  and  temporal  resolution,  which 
are  acceptable  for  the  video  images,  could  not  be  used  for  these  images. 
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TABLE  5-10 

PARAMETERS  OF  A  4  CLASS  ADAPTIVE  HYBRID  CODING  SCHEME  FOR 
THE  ANGIOCARDIOGRAM  IMAGES. 


CLASS 

# 

ACTIVITY 

INDEX 

PROBABILITY 

OF  OCCURRENCE 

TEMPORAL 

CORRELATION 

a 

1 

0.  -  10. 

.570 

.98 

2 

10.  -  25. 

.282 

.90 

3 

25.  -  60. 

.103 

.85 

4 

60.  -  00 

.045 

.75 

TABLE  5-11 

PERFORMANCE  OF  THE  ADAPTIVE  HYBRID  CODING  SCHEME  FOR  THE  ANGIOCARDIOGRAM 
IMAGES.  SAME  MEASURED  STATISTICS  WERE  USED  FOR  ALL  CLASSES.  SUB- BLOCK 


SIZE  =  16  X  16. 


"""  ■  1 
S. 

BIT- 

■RATE  PER  PIXEL 

COMPRESSION 

RATIO 

SIGNAL  TO  NOISE  RATIO  IN  dB  j 

N. 

_ 

CLASS 

1 

CLASS 

2 

CLASS] 

3 

CLASS 

4 

"Wer 

ALL 

CLASS 

1 

CLASS 

2 

CLASS 

3 

CLASS 

4 

OVER 

ALL 

1 

.02 

.05 

.13 

.35 

.0625 

128 

35.95 

34.34 

137^ 

33.37 

35.00 

2 

.039 

.12 

.28 

.70 

.1250 

64 

38.39 

37.24 

36.19 

36.17 

37.66 

3 

.10 

.25 

.60 

1.17 

.250 

32 

40.61 

39.35 

38.81 

38.02 

39.87 

4 

.25 

_ 

1.0 

1.7 

.50 

_ 

16 

42.08 

41 .59 

40.73 

39.67 

41.64 
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(c)  C.R.  =  32 


(d)  C.R.  =  16 

angiocardiogram  images.  The 
an  end  of  systole 


Images  resulting  from  data  compression  of 
top  half  of  each  image  approximately  corresponds  to 
and  the  bottom  half  to  the  end  of  the  following  diastole. 

Figure  5-11 


» 


117 


I 


t 


t 


t 


i 


CHAPTER  VT 

DATA  CO^rPRESSIO^’  FOR  NOISY  CHANNELS 

In  the  data  compression  designs  considered  in  the  previous  chapters 
we  did  not  consider  the  effects  of  channel  errors  (in  transmission  or 
storage-retrieval) .  The  performance  of  a  data  compression  method  was 
evaluated  assuming  a  noise-free  channel.  However,  in  the  presence  of 
channel  errors  (bit  reversals)  a  coding  scheme  designed  without  regard  to 
channel  noise  characteristics  could  yield  poor  to  disastrous  results. 

A  common  approach  for  reducing  the  effects  of  channel  errors  has 
been  the  use  of  error  correcting  codes  [45]  which  aim  at  minimizing  the 
probability  of  bit  error  by  introducing  redundancy  in  the  code  word 
(blocks  4  and  6  of  Fig.  1-1).  However,  a  better  design  would  be  to  in¬ 
corporate  channel  characteristics  in  the  data  compression  algorithm  itself, 
(blocks  3  and  7  in  Fig.  1-1),  e.g.,  in  the  design  of  quantizer  [41], 
design  of  predictor  coefficients  for  DPCM  transmission  [14],  periodic 
reinitialization  of  DPCM  loops,  etc. 

Most  conventional  error  correcting  codes  provide  equal  protection 
to  all  the  bits  for  a  Gaussian  binary  symmetric  channel.  Often, all  the 
data  bits  do  not  have  equal  importance.  For  example.  In  transform  coding 
the  transform  coefficients  have  highly  uneven  distribution  of  mean  square 
energy  and  different  bits  of  the  same  coefficient  have  unequal  effect  on 
the  mean  square  energy.  For  example,  in  a  transform  image  coding  scheme 
described  in  [51]  certain  bits  which  have  "significant"  effects  on  image 
quality  are  identified  and  only  these  bits  are  provided  protection  by 
using  error  correcting  codes.  However,  the  experimental  method  used 
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there  for  identifying  the  significant  bits  is  tedious  and  does  not  have 
any  systematic  quantitative  formulation.  Also,  all  the  significant  bits 
are  provided  equal  protection,  although  their  effects  on  the  image  quality 
vary  considerably. 

6.1  Channel  Encoding-Decoding  of  a  Random  Variable  with  MSE  Criterion: 

Crlmmins,  Horwltz,  et  al. [18.19]  have  proposed  an  alternative 
method  of  encoding  numerical  data.  Their  method  is  based  on  minimizing 
the  mean  square  error  (MSE)  due  to  channel  noise  rather  than  minimizing 
the  probability  of  bit-error.  They  find  the  optimum  encoding  and  decoding 
rules  for  transmitting  a  set  S  of  equispaced  and  equiprobable  real  numbers 
over  a  memoryless  channel  using  certain  group  (or  block)  codes.  The  set 
S  contains  K  elements  where  K  =  2  ,  integer  k  ^  0,  and  the  code  words  are 
chosen  from  a  given  group  G  of  order  K.  Each  member  of  group  G  is  a  code 
word  of  length  n  bits  (n  ^  k) .  G  is  thus  a  subgroup  of  the  binary  group 
V  containing  all  the  code  words  of  length  n.  Both  G  and  V  are  groups 
under  exclusive-or  operation  (denoted  by  ©). 

In  [18,19]  the  decoding  rule  is  restricted  to  map  back  into  the 
set  S.  Wolf  and  Redinbo  [78,79]  have  extended  these  results  to  the  case 
where  the  optimum  decoder  maps  Into  the  field  R  of  real  numbers.  No 
method,  other  than  exaustlve  search,  for  finding  the  optimum  subgroup  G 
has  been  found. 

Usually  the  finite  set  S  contains  the  quantized  values  of  a  con¬ 
tinuous  random  variable,  say  y.  For  example  if  y  is  a  random  variable  rep- 
representing  the  intensity  of  an  image,  then  S  =  {O, 1, . . . ,255}  could 
represent  digitized  values  of  pixels  for  8  bits/pixel  digitization. 
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Except  when  y  is  uniformly  distributed  the  set  S  of  its  quantized  values 
cannot  be  equispaced  and  equiprobable  at  the  same  time  for  k  >  1.  If  y  is 
non-uniformly  distributed  and  is  quantized  using  the  minimum  mean  square 
error  Max  quantizer,  the  set  S  is  neither  equispaced  nor  equiprobable  for 
k  >  1.  The  procedure  of  [79]  for  finding  the  optimum  encoding  rule  does 
not  apply  to  such  cases.  This  still  remains  an  open  problem.  The  optimum 
decoding  rule  is  still  given  by  the  conditional  mean  [79].  Based  on  our 
experimental  results  we  believe  that  the  codes  generated  by  the  method  of 
[79]  could  be  used  for  the  nonuniforroly  distributed  y  with  great  advantage 
over  the  conventional  error  correcting  codes. 


Figure  6-1;  Transmission  of  a  continuous  Pandom  \'ariable  havine  zero 
mean  and  unit  variance  over  a  nolsv  channel 

i 


Figure  6-1  shows  a  PCM  transmission  scheme  for  noisy  channels.  Let 
be  a  k-bit  quantization  function  of  a  random  variable  y  defined  by 


a^(y)  =  s.  for  Yj  1  y  1  Yj+i-  ^  1  ^j+l, 

j  =  1,2 . K 


where  y^  and  are  the  minimum  and  the  maximum  values  of  the  random 

variable  y  and 


Yi  <  Yj  if  i  <  j. 


J 
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Then  it  follows  that  S  =  {s,  ,s- , . . . ,s  }  Is  an  ascendingly  ordered  set  of 
real  numbers.  Let  H  be  the  group  of  all  k-bit  binary  code  words  arranged 
in  the  natural  order,  i.e.,  the  jth  element  of  H  is  the  k-bit  binary  repre¬ 
sentation  of  the  integer  j .  Let 

1-1 

\  :  S  - ^  H 


be  an  ordered  mapping  of  S  onto  H.  We  would  like  to  point  out  that  under 
this  mapping  the  optimum  encoding  procedure  of  [79]  provides  maximum  pro¬ 
tection  to  the  most  significant  bit  of  h  6H  and  increasingly  lesser  protec¬ 
tion  to  the  lesser  significant  bits.  For  most  distributions  and  quantizers 
of  practical  interest  this  results  in  a  significantly  better  mean  square 
performance  than  equal  bit  error  protection  encoding.  This  is  our  ration¬ 
ale  behind  using  the  mean  square  encoding  procedure  of  [79]  even  though  it 
is  not  optimum  for  nonuniform  distributions.  Let 


H 


1-1 

- >-  G 


1-1 

6  :  S  - G  , 


then  B  =  9^  O  is  the  encoding  rule,  where  O  represents  composite 
function  operation. 

The  channel  error  function  y  transforms  an  n-bit  code  word  g  €  G 
into  another  n-bit  word  v  €  V  randomly  and  is  described  by  the  transition 
probability  P(v|g).  Let 

V  =  g  ©  u 


where  u  is  an  n  bit  error  word.  Let  us  assume  that  the  channel  is 
memoryless ,  i.e., 


< 


t 
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P(vjg)  =  P(v©gto)  =  P(u|0) 


where  0  is  the  identity  element  of  V.  The  function n  Is  then  completely 
characterized  by  P(u|o),  V  u  €  V.  The  decoding  function  \  maps  the  n-bit 
word  V  into  a  real  number  y*.  We  define 


e 


e 


e 


q 

c 

t 


s-y,  q(k)  =  [e^]/a^ 
y*-s,  c(n,k)  =  E[e^]/a^ 
y*-y,  t(n,k)  =  £[6^1/0^ 


# 


f- 


f 


f 


where  q,  c,  and  t  are  the  quantization,  the  channel,  and  the  total  (quan- 

2 

tization  plus  channel)  mean  square  distortion  functions  and  a  is  the 
variance  of  y.  The  optimization  problem  can  now  be  stated  as  follows: 

Given  n, n  and  the;  distribution  of  y,  find  k,  a,  3>  X  and  G  such  that  the 
total  distortion,  t(n,k)  is  minimized. 

The  problem  as  stated  above  is  quite  difficult  and  the  joint  optimi¬ 
zation  of  the  quantization  and  the  encoding  seems  untractable  at  the 
present.  A  solution  for  a  special  case  of  the  above  problem  has  been 
given  by  Kurtenbach  and  Wintz  [41].  They  assume  k  =  n  (which  implies 
G  =  V) ,  fixed  B  (e.g.,  6  =  y),  and  X  =  g  ^  and  find  an  optimum  quantizer  a. 
This  does  not  provide  protection  for  channel  errors  by  introducing  redun¬ 
dancy.  The  performance  of  such  a  scheme  is  usually  not  as  good  as  those 
which  do  provide  protection  by  introducing  redundancy. 

To  simplify  the  problem,  we  separate  the  quantization  and  the  encod¬ 
ing.  We  choose  a  to  be  the  optimum  Max  quantization  function  [47] ,  which 
minimizes  the  quantization  distortion  q.  For  any  given  g  and  G  the 
decoder  X  which  minimizes  the  channel  distortion  c  is  given  by  the 
conditional  mean  [78,79],  i.e.. 
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A(v)  =  E[s|v]  (6-1) 

=  I  s*P(s|v) 
s€S 


=  S*P(vls)  •P(s)/P(v) 
seS 

=  I  s-P(vl e(s)) •P(s)/P(v) 
s€S 

^  s*P(v 0  6(s) j 0) -PCs) 

=  — -  (6-2) 

I  P(s)-P(v0  6(s)l0) 
s^S 

where  P(s)  is  the  probability  of  the  quantizer  output  s,  which  can  be 
calculated  for  any  given  a  from  the  distribution  of  y.  If  a  is  the  Max 
quantization  function,  then  it  can  be  verified  that  (6-1)  also  implies 
X  =  E[y|v].  Let  P(v,s)  be  the  joint  distribution  of  the  random  variables 
s  and  V.  Then  the  channel  distortion  is  given  by 

‘^n.k  = 

=  E[{X(v)-s}^] 

~  1  1  {X(v)-s}^P(v,s) 

ses  v^v 

-  1  1  {X(v)-s}^P(vi s)P(s) 

sfS  vfV 

~  1  1  {X(v)-s}^P(v  ©  B(s) |0) •P(s) .  (6-3) 

34 s  v4V 

Now  we  establish  the  following  relationships  between  y,  s,  y  ,  e  ,  e  and 

Q  C 
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Theorem  6-1:  If  a  is  the  Max  quantization  function  and  X  is  given 
by  (6-1),  then  the  following  hold 


(!)  Efs'e^]  =  0 


(ii)  E[y**e^]  =  0 


(iii)  E[y**e^]  =  0 


(6-4) 

(6-5) 

(6-6) 


Proof:  Let  fY(y)  Le  the  probability  density  function  of  y.  Then 
the  following  holds  true  for  the  Max  quantization  [47] 


"i  = 


^i+l 


1  <  i  <  K  . 


(6-7) 


^i 


Part  (i)  of  the  theorem  is  a  well  known  result  for  the  Max  quantizer  and 
will  not  be  proved  here. 

The  right  hand  side  of  (6-5)  can  be  written  as 


Now 


E[y**ej.]  =  Ety*(y*-s)] 

=  E[y*^]  -  E[y*s] 


E[y*s]  =  E[X(v)*s] 


(6-8) 


=  I  I  s*X(v)*P(s,v) 
s€S  v€V 

”11  S'X(v)*P(sl v)P(v) 
s  V 

=  J{X(v)*P(v)  I  s’P(s|v)} 
V  s 

=  ^  {X(v) •P(v) •E[sl v]} 


L 
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=  ^X(v)-P(v)*X(v)  [using  (6-1)] 

V 

=  E[{X(v)}^] 

=  E[y*2] 


Thus,  from  (6-8)  and  (6-9)  we  have 


E[y**e^]  =  0 

which  proves  (6-5) . 

The  left  hand  side  of  (6-6)  can  be  written  as 
E[y*ej.]  =  E[y*(y*-y)] 

=  E[y*^]  -  E[y*y]. 


Since 


E[y*y] 


=  I  X(v) 


v€V 


» 

K 

,>'1+1 

=  ^X(v)  -PCv)  •  ^ 

I 

yf  (y)dyP(s  1  v)  ■ 

V 

i=l 

1  ^  1 

1  : 

Using  (6-7)  we  have 


“■'V 


(6-9) 


(6-10) 


r 
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E[y*y]  =  ^A(v)’P(v)-'  I  s^P(s^]v) 

V  [1=1 

=  ^X(v) •P(v) •  Ets j  v] 

V 

=  ^X(v) -PCv) •X(v) 

V 


=  E[{X(v)}^] 
=  E[y*^]  . 


Thus,  from  (6-10)  and  (6-11)  we  have 


E[y*.e^]  =  0 


(6-11/ 


which  proves  (6-6)  and  completes  the  proof  of  the  Theorem  6-1. 

Corollary;  The  errors  due  to  the  quantization  and  the  channel  noise 
are  uncorrelated,  i.e.. 


Proof: 


E[e  e  ]  =  0  .  (6-12) 

q  c 


E[e^ec]  =  E[e^(y  -s)] 

=  E[e^y*]  -  E[e^s] 

=  E[y*(e^-e^)]  -  E[e^sl 
=  E[y*e^]  -  E[y*e^]  -  E[se^] 


=  0 


which  follows  from  the  theorem.  A  direct  consequence  of  (6-12)  is 


I 

i 

.* 

^  i 


t(n,k)  *  c(n,k)  +  q(k)  . 


(6-13) 
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For  a  given  n,  the  optimum  value  of  k,  k' ,  is  found  by  computing  t(n,k) 
for  0  £  k  £  n  and  finding  the  minima  of  k  vs  t(n,k).  Let 

d(n)  =  min‘{t(n,k)}  =  t(n,k').  (6-14) 

k 

Then  d(n)  vs  n  gives  the  distortion-rate  function  of  the  PCM  channel.  We 
call  the  optimization  of  (6-14)  channel  optimization. 

For  uniform  distribution  of  y  the  optimum  encoding  function  6  for 
a  given  G  can  be  found  as  in  [79].  The  same  B  could  be  used  for  some  other 
distributions  of  practical  interest  as  pointed  out  earlier.  No  simple 
method  for  finding  the  optimum  6  for  such  cases  has  been  found  so  far. 

6.2  Coding  of  a  Random  Process  for  Noisy  Channels: 

The  concept  of  channel  optimization  for  a  single  random  variable 
could  be  extended  for  coding  images  and  other  correlated  signals  or 
random  processes.  In  particular  we  consider  the  transform  coding  method. 
With  respect  to  channel  errors  transform  coding  has  an  advantage  over 
the  predictive  or  hybrid  methods.  Since  each  transform  coefficient  is 
coded  completely  independently,  any  error  due  to  the  channel  noise  does 
not  effect  the  other  coefficients.  On  the  other  hand  in  predictive  coding, 
the  errors  due  to  channel  noise  accumulate  at  the  time  of  reconstruction 
at  the  receiver.  This  is  because  xjnlike  the  quantization  errors  the 
channel  errors  cannot  be  fed  back  in  the  prediction  loop  at  the  trans¬ 
mitter.  For  optimization  of  prediction  loops  for  DPCM  transmission  of 
Images  over  noisy  channels,  see  [14]. 

Figure  6-2  shows  a  transform  coding  scheme  with  channel  optimiza¬ 
tion.  X  is  an  M  X  1  real  array  and 


y  -  fx 


Is  an  M  X  1  array  of  transform  coefficients,  where  ¥  Is  an  M  x  m 
unitary  transform.  We  assume  for  simplicity  that  all  the  transform 
coefficients  are  real,  identical  distributed,  and  have  zero  mean.  Let 

a^(m)  =  E[y^{m)] 

where  y(m)  is  the  mth  transform  coefficient.  Let  us  also  assume  that  the 
columns  of  the  transform  matrix  are  arranged  in  an  order  such  that 

0^(1)  >  a^(2)  >  ...  >  a^(M).  (6-15) 

Then  the  total  distortion  between  the  input  array  X  and  the  repro- 

it 

duced  array  X  is  given  by 

D  =  E[(X-X*)^(X-X*)] 

=  E[(Y-Y*)^(Y-Y*)] 

=  I  E[{y(m)  -  y*(m)}^] 

IIF=1 

M 

=  ^  0  (m) •t(n(m) ,k(m))  . 

m=l 

Assuming  that  for  each  n(m)  the  optimum  value  k’(m)  is  used,  we  obtain 

D  =  ^  0^(m) ‘dCnCm))  .  (6-16) 

ra=l 

We  would  like  to  minimize  D  subject  to  the  constraint 

^  I  n(m)  =  b  (6-17) 

"mFl 

where  b  is  a  given  rate  in  blts/sample.  We  assume  that  g  and  G  for 
various  values  of  pair  (n,k)  are  chosen  such  that  the  distortion-rate 


function  d(n)  is  a  non-increasing  function  of  n.  This  condition  can 
always  be  satisfied  in  practice.  The  minimization  of  (6-16)  subject  to 
(6-17)  is  identical  to  the  bit  allocation  problem  discussed  in  chapter  IV 
except  that  the  quantizer  distortion  function,  q,  has  been  replaced  by 
the  optimum  total  distortion  function,  d.  However,  it  is  not  easy  to 
approximate  d  by  piecewise  continuous  functions  as  has  been  done  for  the 
quantizer  distortion,  q,  for  some  commonly  used  densities  [86].  Thus, the 
use  of  integer  bit  allocation  procedure  of  [86]  becomes  even  more 
important  in  this  case. 

It  could  be  easily  verified  that  the  K-L  transform  would  be 
the  optimum  unitary  transform  for  the  scheme  described  above. 


6.3  Experimental  Results  and  Distortion-Rate  Functions: 

We  have  carried  out  simulations  for  an  important  class  of  channels, 
the  binary  symmetric  channel  with  probability  of  bit- reversal  p.  We 
report  results  for  the  PCM  transmission  of  a  random  variable  (without 
loss  of  generality  we  assume  zero  mean  and  imit  variance)  for  three  commonly 
used  densities,  the  Gaussian,  the  Laplacian,  and  the  uniform.  The  uniform 
density  also  gives  the  lower  bounds  for  the  quantizer  distortion,  q(k), 
and  the  channel  distortion  normalized  by  the  variance  of  s,  i.e., 
c(n,k)/(l-q(k)) .  For  quantization  we  use  an  approximation  of  the  optimum 
(Max)  quantizer  described  in  [4].  This  approximation  is  very  close  to  the 
optimum.  Table  6-1  gives  the  distortion-rate  functions  for  the  three 
densities. 

Using  the  algorithm  outlined  in  [78],  we  have  found  some  suitable 
choices  of  the  group  G  for  various  values  of  the  pair  (n,k).  Table  6-2 
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TABLE  6-1 

QUANTIZER  DISTORTION  q(k)  FOR  VARIOUS  DENSITIES. 
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- 2 - 

■ - 3 - 

4 

— r 

- 6 - 

- 7 - 

8 
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.25 
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j  GAUSSIAN 
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.00261 
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i  .00017 

4x10"^ 

1  LAPLACIAN 

.1835 

.0571 

.0160 

mm 

' .00027 

7x10 
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BASIS  VECTORS  {g . ;i =I . . . .  ,k}  OF  GROUP  G  FOR  (n,k)  GROUP  CODES. 
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lists  the  basis  vectors  of  these  groups.  The  basis  vectors  have  been 
arranged  so  that  the  encoding  function  has  a  very  simple  form  described 
below.  Let,  for  any  s  €  S 


h  =  \(s)  h  e  H  (6-18) 

and  let  h^  be  the  jth  most  significant  bit  of  h.  Then  the  code  word,  g, 
is  given  by 

k 

g  =  B(s)  E  6  ,  (h)  =  y  ©h^-g.  (6-19) 

n.k  2 

where  ^  ©  denotes  the  exclusive-or  summation,  the  dot  represents  the 
binary  product  (or  'and'  operation),  and  8j's  are  the  basis  vectors  of 
Table  6-2,  We  use  (6-2)  for  decoding  and  (6-3)  for  the  calculation  of 
the  channel  distortion. 

Tables  6-3  and  6-4  give  the  channel  distortion,  c(n,k),  and  the 

_2 

total  distortion,  t(n,k),  for  the  Gaussian  density  for  p  =  10  and 
-3 

p  =  10  for  various  values  of  (n,k).  Tables  6-5  and  6-6  give  the  total 
distortion  for  the  uniform  and  the  Laplacian  densities.  The  channel 
distortion  for  these  densities  can  be  easily  obtained  by  subtracting  the 
quantizer  distortion  given  in  Table  6-1  from  the  total  distortion. 

Table  6-7  shows  the  effects  of  a  proper  choice  of  B.  The  normal  mapping 
here  corresponds  to  the  mapping  obtained  by  the  procedure  of  [78]  which 
is  optimum  for  the  uniform  density. 

We  note  from  (6-2)  that  the  decoder  is  dependent  upon  the  channel 
bit-reversal  probability  p.  While  from  (6-18)  and  (6-19)  we  note  that 
the  encoder  is  Independent  of  p  for  a  given G.  In  practice  p  might  vary 
from  time  to  time  and  thus  cannot  be  known  exactly.  So  it  becomes 
necessary  to  know  the  robustness  of  the  scheme  as  p  deviates  from  the 
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and  would  perform  somewhere  in  between,  but  still  much  inferior  to  the  optimum  m.s.e.  coding. 
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design  value.  Table  6-8  shows  that  the  scheme  is  indeed  quite  robust 
to  a  wide  variation  in  p. 

Figure  6-3  shows  the  effect  of  varying  k  on  the  total  distortion 
for  a  fixed  rate  (or  n).  The  minima  of  the  curves  correspond  to  the 
optimum  value  of  k.  Figure  6-4  gives  the  distortion-rate  functions  for 
various  densities  with  channel  optimization.  We  notice  that  as  the  channel 
becomes  noiser,  the  distortion-rate  curves  start  flattening. 

We  have  also  evaluated  the  performance  of  the  scheme  for  two 
important  classes  of  discrete  random  processes .  The  first  one  is  a  one- 
dimensional  first  order  Markov  process  with  one  step  correlation  para¬ 
meter  p  =  .95.  For  this  process  the  discrete  cosine  transform  has  been 
known  to  perform  very  close  to  the  K-L  transform  [2].  Hence  the  matrix 
¥  has  been  chosen  to  be  the  discrete  Cosine  transform  [2].  Figure  6-5 
shows  the  distortion- rate  curves  for  this  process  for  Gaussian  distribution. 

The  second  class  is  the  2-D  random  field  with  the  isotropic  covar¬ 
iance  model  given  by  (A-2)  with  =  .95.  Once  again,  we  use  the 

discrete  Cosine  transform,  because  as  shown  in  Appendix  B  it  per¬ 
forms  very  close  to  the  K-L  transform.  Figure  6-6  shows  the  distortion- 
rate  curves  for  this  process  with  Gaussian  distribution  for  array  sizes 
of  16  X  16  and  64  x  64.  Figure  6-7  shows  the  bit  assignment  pattern  for 
the  16  X  16  array  size  at  1  bit/sample  rate.  Figure  6-8  shows  the  per¬ 
centage  of  bits  assigned  for  channel  error  protection  (or  redundancy)  as 
a  function  of  rate  and  array  size.  We  notice  that  for  low  channel  noise 
(p  *  10  )  this  percentage  is  almost  constant  for  different  rates  as  well 
as  array  sizes.  Even  for  high  levels  of  channel  noise  (p  ■  10  the 

variation  is  not  too  large.  Another  noteworthy  fact  is  that  the  channel 
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(a)  Bit- Assignment  for  Quantization,  k(i,j). 
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(b)  Additional  Bits  for  Channel  Error  Protection,  n(i,j)  -  k(i,j). 


Bit-Assignment  for  16  x  16  Cosine  Transform  Coding  for  a  2-D  Isotropic 


Random  Field  with  p 


.95.  Bit-Rate  =  1  Bit/Sample. 
Figure  6-7 
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optimization  scheme  requires  much  less  redundant  bits  for  channel  error 
protection  than  the  conventional  error  correcting  codes  would  require. 

As  mentioned  in  chapter  IV,  for  most  images  the  transform  coeffi¬ 
cients  could  be  assumed  to  be  Laplacian  distributed.  So  we  have  also 
calculated  the  distortion-rate  functions  of  the  2-D  isotropic  covariance 
model  for  this  distribution  and  compare  the  channel  optimization  scheme 
(Scheme-1)  with  a  (15,11)  single  error  correction  encoding  scheme  (Scheme-2) 
and  a  scheme  with  no  error  correction  (Scheme-3).  Figure  6-9  shows  the 
distortion- rate  curves  for  16  ^  16  array  size  for  these  three  schemes. 

_3 

The  distortion-rate  curve  for  Schenie-2  for  p  =  10  has  been  obtained 
assuming  that  the  effects  of  two  and  more  errors  in  a  15  bit  code  could  be 
neglected  due  to  their  very  low  probability.  Thus  the  curve  is  somewhat 
optimistic  and  clearly  the  actual  performance  of  the  Scheme-1  relative 
to  the  Scheme-2  would  be  even  better  than  what  Is  shown  in  Figure  6-9. 

_2 

Roughly  we  can  conclude  that  the  performances  of  the  Scheme-1  for  p  =  10  , 

-3  -4 

that  of  the  Scheme-2  for  p  =  10  ,  and  that  of  the  Scheme-3  for  p  =  10 

are  close  to  each  other. 

The  results  of  the  previous  sections  were  applied  for  coding  a 

256  X  256  Girl  image  originally  digitized  to  8  blts/sample.  An  isotropic 

covariance  model  with  “  Pj  =  *95  and  Laplacian  distribution  for  the 

transform  coefficients  were  assumed.  The  array  (or  sub-block)  size  of 

16  ^  16  and  the  Cosine  transform  were  chosen  in  the  coding  algorithm.  The 

_2 

performance  of  the  Scheme-1  and  the  Scheme-3  were  evaluated  for  p  =  10  , 

10  ^  at  1  bit /pixel. 
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TABLE  t~9 

PERFORMANCE  OF  DATA  COMPRESSION  SCHEMES  AT  1  BIT/PIXEL 
FOR  COSINE  TRANSFORM  CODING  OF  256  x  256  GIRL  IMAGE. 
BLOCK  SIZE  =  16  X  16. 


SCHEME 

wmww  ■!  inm Tiir— ■ 

p=.01 

SCHEME  -1 
SCHEME  -3 

31.90  dB 
31.90  dB 

31.40  dB 
25.96  dB 

29.85  dB 
20.05  dB 

Table  6-9  gives  the  signal  to  noise  ratio  (SNR)  and  Figure  6-10 
shows  the  original  and  the  coded  images.  Figure  6-11  shows  various 
absolute  error  images  amplified  ten  times.  Since  the  effect  of  a  bit 
reversal  is  localized  within  a  sub-block  of  an  image,  we  call  it  "blocking 

effect".  From  Figure  6-10  we  see  that  for  Scheme-1  the  performances  at 

-3  -2 

p  =  0  and  p  =  10  are  almost  indistinguishable  and  at  p  =  10  the 

blocking  effects  of  channel  noise  are  somewhat  visible.  While  for 

Scheme-3  (which  provides  no  channel  noise  protection)  the  blocking  effects 

-3  -2 

are  quite  visible  even  for  p  =  10  and  very  prominent  at  p  =  10 

The  results  of  Table  6-9  are  in  excellent  agreement  with  those  of 

Figure  6-9.  Since  Figure  6-8  gives  the  distortion  normalized  by  the 

variance,  while  the  SNR  is  normalized  by  the  peak-to-peak  signal  energy, 

the  SNR  for  an  image  could  be  obtained  from  Figure  6-9  by  subtracting 

the  mean  square  error  in  decibels  from  a  constant 

c  =  10  dB 

2 

where  a  is  the  variance  of  the  image.  For  the  Girl  image  we  get 


(a)  Original  256  x  256  Image 
8  blts/plxel 


(e)  Coded  at  1  blt/plxel  (f)  Coded  at  1  blt/plxel 

p  =  lO"  ,  Scheme-3  p  =  10  ,  Scheme-1 

Images  Resulting  From  16  x  16  Cosine  Transform  Coding  and  Transmission 
over  a  Binary  Symmetric  Channel. 

Figure  6-10 


(d)  Channel  Noise 
-2 

p  •  10  ,  Scheme- 3 


(e)  Channel  Noise 
-2 

p  “  10  ,  Scheme-1 


Error  Images  Corresponding  to  the  Coded  Images  of  the  Previous  Figure 

Figure  6-11 
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f(255)^  '' 
=  15.42  dB. 


dB 


Thus  from  Figure  6-9  we  obtain  the  SNR  for  the  Girl  image  for  Scheme-1 

_  3 

at  p  =  10  and  1  bit/pixel  as 

SNR  =  15.42  -  (-15.48)  dB 
=  30.90  dB 

which  is  very  close  to  the  actual  performance  given  in  Table  6-9  as 
31.40  dB.  Thus  the  model  used  for  the  data  compression  of  the  Girl  image 
seems  to  be  realistic. 

It  is  also  apparent  by  viewing  Figure  6-11  that  there  is  a  marked 
difference  between  the  distribution  and  the  visual  effects  of  the  two 
sources  of  errors,  i.e.,  the  quantization  and  the  channel  noise.  Thus  it 
might  be  desirable  to  assign  different  weights  to  these  errors.  This 
could  be  easily  incorported  in  our  scheme  by  defining  the  total  distortion 


as 


t(n,k)  =  q(k)  +  w^’c(n,k) 


and  then  performing  channel  optimization  as  before.  A  suitable  value  of 
weighting  coefficient  w^  has  to  be  found  experimentally. 

The  concept  of  channel  optimization  can  also  be  extended  to 
hybrid  coding.  This  and  the  application  to  interframe  coding  have  been 
left  for  future  research. 


CHAPTER  VII 


SUMMARY,  CONCLUSIONS,  AND  SUGGESTIONS  FOR  FUTURE  WORK 
7 . 1  Summary 

The  new  results  presented  in  this  thesis  are  summarized  as 
follows . 

A  hypothesis  that  the  temporal  dimension  of  most  video  motion 
images  consists  mainly  of  a  deterministic  compenent,  called  motion,  was 
presented.  A  method  for  the  visual  characterization  of  the  deterministic 
component  in  a  stationary  mode,  based  on  the  temporal  cross-sections, 
was  described.  A  piecewise  linear  translation  model  for  the  motion  tra¬ 
jectory  estimation  was  developed.  Based  on  this  model,  some  simple 
relationships  to  calculate  statistical  parameters  of  the  random  component 
were  derived. 

A  new  technique  for  efficient  estimation  and  coding  of  the  deter¬ 
ministic  component  was  presented.  The  experimental  results  of  application 
of  this  technique  to  actual  image  data  (Head  and  Shoulders  and  Chemical 
Plant)  show  that  it  gives  very  good  estimates  and  that  the  piecewise 
linear  translation  approximation  on  a  sub-block  (of  suitable  size)  basis 
is  reasonably  good. 

The  registration  of  successive  frames,  called  motion  compensation, 
results  in  a  tremendous  improvement  in  prediction  (about  10  to  12  dB 
decrease  in  interframe  variance) ,  and  the  remaining  motion  uncertainty 
in  the  areas  of  motion  is  approximately  uniformly  distributed  between  0 
to  0.5  pixels  along  both  the  spatial  axes.  This  high  degree  of  registra¬ 
tion  results  in  temporal  bandwidth  reduction,  and  permits  reducing  the 
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sampling  rate  of  the  temporal  axis  (i-e.,  frame  skipping).  The  missing 
samples  (or  frames)  can  be  fairly  accurately  reproduced  by  zeroth  or 
first  order  linear  interpolation. 

Simple  first  order  Markov  covariance  models,  e.g.,  separable, 
result  in  very  poor  performance  for  a  transform  coder  (Interframe  as  well 
as  intraframe)  compared  with  the  measured  statistical  models.  Also,  the 
simple  (or  non-adaptive)  transform  coders ,  which  are  based  on  approxima¬ 
tion  of  image  random  process  by  a  wide  sense  stationary  process,  re¬ 
sult  in  poor  performance  for  motion  Images  (mostly  in  reproduction  of 
sharp  edges  and  motion) .  A  significant  Improvement  can  be  achieved  by 
an  adaptive  scheme  which  approximates  the  nonstationary  process  by  four 
piecewise  stationary  processes.  However,  for  the  biomedical  projection 
images  the  assumption  of  wide  sense  stationarity  is  reasonable. 

Comparison  of  the  non-adaptive  intraframe  and  Interframe  schemes 
for  video  motion  images  shows  that  the  mean  square  performance  of  the 
Interframe  scheme  having  a  sub-block  size  of  16  x  16  x  16  can  be  matched 
by  an  intraframe  scheme  having  a  sub-block  size  of  64  x  64  (the  total 
array  size  for  both  are  the  same) . 

The  x-ray  projection  images  have  high  correlation  and  the  inter- 
frame  transform  coding  of  these  results  in  very  high  compression.  The 
effect  of  the  distortion  in  the  projection  images  on  the  reconstruction 
of  the  3-D  object  has  been  evaluated  by  reconstructing  some  transaxlal 
cross-sections  (or  levels)  of  the  object.  The  results  show  that  high 
compression  ratios  are  achievable  on  these  Images. 

The  performance  of  a  hybrid  scheme  for  the  video  motion  images  can 
be  significantly  improved  by  adapting  the  bit-rates  and  the  statistics 
to  the  local  variations  in  the  spatial  and  temporal  characteristics.  A 
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simple  criterion  for  this  adaptation  is  proposed.  Once  again,  this  is 
achieved  by  approximating  the  nonstationary  process  by  four  piecewise 
stationary  processes. 

The  incorporation  of  motion  compensation  and  2:1  subsampling  of 
temporal  axis  results  in  further  significant  Improvement  in  the  perform¬ 
ance  of  the  adaptive  hybrid  coding.  For  the  Head  and  Shoulders  images 
high  quality  images  (SNR  =  37  dB)  are  obtained  at  .125  bit/pixel  or  a 
compression  ratio  of  64  is  realized. 

The  adaptive  hybrid  coding  (without  motion  compensation)  results 
in  very  high  compression  ratios  for  the  angiocardigram  x-ray  motion 
Images.  Compression  ratios  of  32-128  seem  to  be  realizable  based  on  the 
evaluation  of  still  Images.  Spatial  statistics  of  these  Images  are 
represented  very  well  by  stationary  models. 

A  method  for  the  joint  optimization  of  source  coding  and  channel 

coding  for  PCM  transmission  over  noisy  channels  has  been  presented.  It 

was  shown  how  this  method  could  be  applied  to  image  transform  coding. 

The  rate  distortion  curves  and  the  experimental  results  on  images  show 

that  this  method  performs  significantly  better  than  the  conventional 

error  correcting  codes  or  schemes  with  no  channel  protection.  For  example, 

-2 

at  1  bit  rate  and  channel  error  probability  10  ,  the  proposed  algorithm 

improves  the  performance  of  an  ordinary  transform  coder  (designed  for  noise 
free  channel)  by  almost  10  dB. 

The  performance  of  the  K-L,  Cosine,  Sine,  Fourier,  and  Hadamard 
transforms  for  several  commonly  used  Intraframe  nonseparable  covariance 
models  have  been  compared  for  an  array  size  of  8  x  8.  The  results  indi¬ 
cate  that  for  all  these  models  (isotropic  [53],  NCI  [34],  and  measured 
covariance  of  a  Girl  image)  the  Cosine  transform  performs  vary  close  to 
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the  optimum  K-L  transform,  and  the  remaining  transforms  perform  close  to 
^  each  other.  The  performance  of  the  K-L  is  about  .05  dB  better  than  the 

Cosine  and  about  1  dB  better  than  the  remaining  transforms.  Earlier 
results  have  shown  the  near  optimality  of  the  Cosine  transform  for 
separable  covariance  model  only. 

7.2  Conclusions  and  Recommendations  for  Future  Work 

Based  on  the  results  and  experimental  evidence  presented  in  this 
thesis,  we  make  the  following  conclusions  and  recommendations  for  further 
investigation. 

For  multiframe  motion  images  considered  here,  the  motion  between 
successive  frames  can  be  very  closely  approximated  by  piecewise  linear 
translation  of  sub-blocks  of  size  8  x  8  to  32  x  32  with  an  average 
*  accuracy  of  .25  pixel. 

The  Interpolation  of  skipped  frames  along  motion  trajectory  (ob¬ 
tained  by  above  approximation)  results  in  excellent  encoding  of  the 
f  skipped  frames.  Thus,  we  conclude  that  the  bandwidth  of  the  temporal 

domain  can  be  significantly  reduced  by  motion  compensation. 

The  logarithmic  search  method  of  direction  of  minimum  distortion 
t  (DMD)  could  also  be  useful  in  many  other  applications  of  image  registra¬ 

tion,  e.g.,  terminal  guidance,  template  matching.  This  will  be  a  subject 
of  our  future  research. 

§  The  performance  of  transform  coding  is  highly  dependent  on  the 

statistical  model,  especially  at  high  bit-rates.  Measuring  the  statis¬ 
tics  in  transform  domain  results  in  a  significant  improvement  in  per- 
fe  formance  (2-4  dB).  We  have  not  addressed  the  question  of  how  often  the 
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statistics  need  to  be  measured  for  a  given  application.  It  has  been 
left  for  future  Investigation. 

The  mean  square  performance  of  the  nonadaptlve  Intraframe  and 
Interframe  transform  coding  schemes  Is  comparable  for  equal  total  array 
size  of  the  sub-block,  thus  making  the  Interframe  transform  coding  un¬ 
attractive  for  motion  Images.  Further  Investigation  Is  required  to 
establish  the  effect  of  array  size  on  coder  performance. 

The  adaptive  variable  bit-rate  transform  and  hybrid  coders  have 
much  better  performance.  The  result  Is  Improved  by  4  dB  over  the  non¬ 
adaptlve  schemes,  and  the  motion  and  the  sharp  features  are  better 
reproduced. 

Motion  compensation  and  alternate  frame  skipping,  with  Interpo¬ 
lation  of  skipped  frames  along  motion  trajectory,  results  In  a  further 
compression  gain  by  a  factor  of  two.  Higher  compression  gain  seems  likely 
by  further  reducing  the  sampling  along  of  the  temporal  axis  and  Interpo¬ 
lation  along  motion  trajectory.  Although  we  have  only  applied  the  motion 
compensation  method  of  chapter  II  to  hybrid  coding,  this  and  subsampling 
of  temporal  axis  can  also  be  used  with  the  predictive  coding  schemes  of 
chapter  III  and  similar  gains  are  expected.  Application  of  motion  com¬ 
pensation  to  3-D  transform  coding  seems  to  be  difficult. 

The  joint  optimization  of  the  source  coding  and  the  channel  coding 
results  in  significant  improvement  in  performance  of  a  coding  scheme. 

The  concept  of  channel  optimization  for  PCM  transmission  can  be  easily 
extended  to  DFCM  and  thus  to  the  hybrid  coding  methods. 

The  cosine  transform  performs  very  close  to  the  optimal  K-L  and 
Its  many  computational  advantages  over  K-L  makes  It  a  better  choice  for 
Image  data  compression  for  a  variety  of  random  fields. 


153 


Some  of  the  techniques  of  video  bandwidth  compression  can  be 
applied  to  the  biomedical  x-ray  images  with  very  high  compression.  Further 
research  in  this  area  is  needed  to  more  qualitatively  evaluate  the  effects 
of  distortion  on  the  medically  useful  Information. 


IN 


154 


APPENDIX  A 


MODELING  OF  INTRAFRAME  IMAGE  STATISTICS 


A.l  Covariance  Models  for  2-D  Images: 


The  second  order  statistics  of  Images  are  required  for  many  image 


processing  applications,  e.g. ,  restoration,  and  coding.  Assuming  that  the 
Images  belong  to  2-D  stationary  random  fields,  a  widely  used  model  for 
image  covariances  is  the  separable  model  given  by 


i,n  -  ^K,^“k±m,£±n^  =  (Pi)”’(Pj)' 


(A-1) 


m,n  >0,  Ip^l  <1,  1 Pj I  <1 


where  u^  ^  is  the  intensity  of  (i,j)th  pixel  and  p^  and  p^  are  one  step 
correlation  parameters  along  indices  1  and  j.  Without  loss  of  generality 
we  have  assumed  images  having  zero  mean  and  unity  variance. 

Although  the  model  of  (A-1)  results  in  a  very  simple  mathematical 
analysis,  it  is  known  to  be  a  poor  approximation  of  the  actual  image  co- 
variances  [34].  Another  image  model,  which  is  called  Isotropic  covariance 
model  [53,66],  is  known  to  be  a  better  approximation  for  most  images  but 
has  not  been  used  widely  so  far  because  of  resulting  difficulties  in 
analysis.  It  is  given  by 


A  =  exp}-/a,m^  +  a,n^  [ 
m,n  *^11  j  ) 


(A- 2) 


where 


m,n  >  0 


=  (Jln{p^})^  ;  =  (S,n{pj})^ 


and  p^  »  Pj  if  the  images  are  sampled  at  the  same  rate  along  both  the  axes. 


Equation  (A-2)  then  simply  means  that  the  correlation  between  any  two 
image  points  is  an  exponentially  decreasing  function  of  their  goemetric 
distance,  while  in  (A-1)  it  is  the  sum  of  the  horizontal  and  vertical 
distances.  From  this  statement  it  is  clear  that  (A-2)  would  be  a  better 
model  for  most  images.  In  [35]  it  has  been  demonstrated  that  the  models 
which  closely  approximate  (A-2)  give  much  better  performance  in  filtering 
images  than  that  of  (A-1) . 

A  model  based  on  a  finite  difference  approximation  of  an  elliptical 
partial  differential  equation, reported  in  [34],  and  referred  as  NCI  model 
[34,35],  has  been  found  very  useful  in  modeling  image  statistics.  It  is 
a  four  point  nearest  neighbor  non-causal  (NC)  model  represented  by  the 
relationship 


“i,j  ■  ^  “i+i.j  *  =i.j 


(A- 3) 


where  {e  .}  is  a  zero  mean,  moving  average  field  whose  covariance  function 
i»  j 


is 


E[e.  .G  ]  =  6  (6,  6.  -  ayd,  ,,6.  .,) 

i,j  m,n  i-m  j-n  '  i-m±l  j-n±l 


(A-4) 


and  6  is  the  Kronecker  delta  function.  Suitable  values  of  a,  B  and  y 
could  be  found  for  a  class  of  images.  The  application  of  this  model  for 
filtering  and  data  compression  could  be  found  in  [35]  and  [75]  respectively. 
The  calculation  of  covariances  generated  by  this  model  is  described  in  [34]. 

Sometimes  a  direct  model  of  covariances  is  obtained  by  measuring 
these  quantities  for  a  given  image  data  as  follows.  Let  K  x  l  be  the 
size  of  a  window  over  which  the  covariances  are  desired  and  M  x  n  be  the 
size  of  data  array,  U,  such  that 
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K  «  M  and  L  «  N  . 


Then,  we  define 


M-K  N-L  /M-K  N-L  „ 

h.  =  Y  \  --i^/  I  I  • 

m,n  1,3  l-hn,3+ny  i,j 

0  <  m  <  K-1,  0  <  n  <  L-1  . 


(A-5) 


A. 2  Computation  of  Transform  Coefficient  Variances: 

For  intrafrarae  transform  coding,  as  well  as  for  interframe  hybrid 
coding, we  need  to  know  the  statistics  of  images  in  the  transform  domain, 
particularly  the  variances  of  the  transform  coefficients.  Let  U  denote 
an  M  X  N  block  of  an  image,  denote  an  L  x  L  unitary  transform,  V  denote 

Li 

the  M  X  N  array  of  the  transform  coefficients  of  U,  and  W  the  array  of  the 
variances  of  transform  coefficients.  Then 


and 


V  . 

w  =  E[(v.  .)^'  I-lilM,  l<j<N. 


Let  bar  on  the  top  of  an  array  represent  lexicographic  ordering  of  the 

elements  into  a  one  dimensional  array  and  R  =  {a,  .  :  0  1  M-1, 

1 » J 

0  £  j  N-l}  be  an  M  X  N  covariance  matrix  of  the  image  random  field.  We 
wish  to  find  the  elements  of  W, given  R.  From  the  above  definitions,  it 
could  be  easily  seen  that 


and 


V  = 

W  -  Diag.{E[Vv’l} 


(A-6) 
(A- 7) 
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where  0  denotes  the  Kronecker  product  of  matrices  and  Diag.{B}  represents 
a  one  dimensional  array  containing  the  diagonal  elements  of  a  square  matrix 
B.  From  (A-6)  and  (A-7)  we  have 

W  =  Diag.{(ii)^  0  E  [U  U^]  0 

or  W  =  Diag.{('i;^0i/;j^)^  (A-8) 

where  ^  is  a  N  x  N  symmetric  block  Toeplitz  matrix  whose  each  element  is 
an  M  X  M  symmetric  Toeplitz  matrix.  The  elements  of  (ft  are  given  by 


i.j  ;k,J, 


=  fL\ 


k-J-I  ,  i  i-j  1 


1  <  l,j  <  N 


1  <  k,£  <  M 


(A-9) 


where  the  first  two  subscripts  of  refer  to  the  addresses  of  the  blocks 
and  the  last  two  refer  to  the  addresses  of  the  elements  within  a  block. 

Thus, one  can  calculate  the  transform  coefficient  variances  by 
appropriately  taking  the  transform  of  (ft.  We  have  also  found  an  efficient 
algorithm  for  computing  (A-8)  which  exloits  the  Toeplitz  structure  of  ^ 
and  the  fact  that  only  diagonal  elements  of  its  transform  are  needed. 

This  will  be  published  elsewhere. 

Table  A-1  shows  the  cosine  transform  coefficient  variances  for  a 
sub-block  size  of  16  x  16  measured  over  the  16  frames  of  Head  and  Shoulders 
data  set.  Each  sub-block  was  first  transformed  by  a  discrete  Cosine  trans¬ 
form  and  then  for  each  transform  coefficient  the  variance  was  measured  over 
all  the  data  sub-blocks.  Table  A-2  shows  a  2-D  16  x  16  covariance  matrix, 
R,  corresponding  to  the  model  of  (A-2)  and  Table  A-3  shows  its  correspond¬ 
ing  Cosine  transform  coefficient  variance  matrix,  W.  Comparing  the  cor¬ 
responding  entries  of  Tables  A-1  and  A-3,  the  ratio  Is  not  too  far  from 
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unity  for  the  lower  order  coefficients,  while  for  higher  order  coeffi¬ 
cients  the  ratio  is  too  far  deviated  from  unity. 

The  consequence  of  the  above  is  that  at  lower  bit- rates,  where  no 
bits  are  assigned  to  most  higher  order  coefficients,  the  performance  of 
the  coders  using  variances  of  Tables  A-1  or  A- 3  would  be  very  close  and 
hence  (A-2)  is  a  good  model.  But  at  higher  bit-rates, the  variance  distri¬ 
bution  of  Table  A-3  would  tend  to  assign  bits  to  higher  order  coefficients 
unnecessarily.  We  have  found  experimentally  that  for  high  resolution 
smooth  images,  which  have  very  low  variance  for  higher  order  coefficients, 
a  correction  factor  applied  to  the  transform  coefficient  variances, 
resulting  from  the  model  of  (A-2),  improves  the  performance  at  high  bit- 
rates  considerably.  One  such  correction  factor,  for  Head  and  Shoulders, 
data,  is  given  by 


b 

hw^  j (p) 


+ 


b 


50 


(A-10) 


where  p  =  (p^  +  p^)/2 

and  h  is  chosen  such  that 


M 


I 

i=l 


IW.  . 

1=1 


M  N 


I  I 

i=l  j=l 


w.  .  . 


Table  A-4  shows  the  matrix  of  Table  A-3  with  the  above  correction  factor. 
We  can  see  that  the  entries  of  Tables  A-1  and  A-4  are  close  and  hence  the 
model  of  (A-2)  with  the  correction  factor  of  (A-10)  is  a  better  model  for 
coding  the  images  belonging  to  the  same  class  as  the  Head  and  Shoulders 
images  than  (A-2)  with  no  correction.  However,  this  correction  factor  is 
data  dependent,  but  a  suitable  value  of  parameter  b  in  (A-10)  could  be 


found  for  other  data. 


TRANSFORM  COEFFICIENT  VARIANCES  FOR  16  x  16  SUB-BLOCK  SIZE  COSINE  TRANSFORM  MEASURED  OVER  THE  16  FRAMES 
OF  HEAD  AND  SHOULDERS  DATA. 


159 


O 

O 

O 

O 

o 

o 

o 

o 

o 

9 

9 

O 

9 

O 

o 

o 

o 

O 

Q 

o 

o 

o 

9 

o 

o 

9 

9 

9 

9 

O 

O 

O 

• 

o 

• 

O 

4 

O 

4 

C3 

4 

a 

4 

o 

o 

4 

o 

4 

9 

4 

o 

4 

O 

4 

O 

4 

O 

4 

O 

4 

9 

4 

pi 

o 

e 

O 

o 

o 

o 

o 

o 

o 

9 

9 

9 

9 

O 

O 

o 

o 

o 

o 

o 

o 

a 

9 

9 

9 

9 

O 

9 

9 

9 

9 

o 

• 

o 

• 

o 

4 

o 

4 

o 

4 

o 

4 

o 

4 

9 

4 

o 

4 

o 

4 

9 

4 

9 

4 

9 

4 

9 

4 

9 

4 

O 

4 

41 

o 

o 

o 

o 

a 

9 

o 

9 

9 

9 

9 

9 

O 

o 

o 

o 

o 

o 

o 

9 

o 

9 

O 

O 

9 

9 

9 

9 

o 

• 

o 

• 

o 

4 

o 

4 

o 

4 

o 

4 

o 

o 

4 

o 

o 

4 

9 

4 

O 

4 

O 

4 

9 

9 

4 

O 

4 

«D 

w* 

w* 

o 

a 

o 

o 

9 

9 

9 

9 

O 

O 

<N* 

o 

o 

O 

o 

o 

9 

9 

o 

9 

O 

G 

9 

9 

9 

9 

O 

• 

o 

« 

o 

4 

O 

u 

4 

o 

o 

4 

9 

4 

o 

4 

o 

9 

9 

9 

4 

9 

4 

O 

4 

9 

4 

m 

K> 

CN» 

04 

o 

o 

o 

9 

O 

9 

9 

9 

9 

« 

o 

O 

O 

o 

9 

o 

9 

o 

o 

9 

9 

9 

9 

9 

O 

o 

• 

o 

• 

o 

4 

o 

4 

o 

4 

o 

4 

o 

o 

4 

9 

4 

9 

9 

9 

4 

9 

4 

9 

4 

9 

4 

9 

4 

*0 

tn 

•o 

ip 

9 

9 

9 

9 

O 

9 

O 

o 

a 

o 

o 

o 

o 

o 

c 

9 

O 

9 

9 

9 

9 

o 

o 

• 

o 

• 

o 

4 

o 

4 

o 

4 

o 

4 

o 

o 

4 

o 

4 

o 

4 

9 

9 

4 

9 

4 

9 

4 

9 

4 

o 

4 

4r 

fi 

4> 

kp 

«o 

9 

9 

9 

o 

O 

9 

cc 

o 

o 

o 

o 

a 

o 

o 

9 

o 

9 

9 

9 

9 

9 

9 

o 

• 

o 

• 

o 

4 

o 

4 

o 

4 

o 

4 

o 

4 

o 

4 

9 

4 

o 

4 

O 

4 

9 

4 

9 

4 

9 

4 

9 

4 

9 

4 

nO 

«i4 

cc 

UN 

»p 

O 

9 

O 

9 

9 

o 

«« 

o 

9 

o 

o 

o 

e 

9 

9 

O 

9 

9 

O 

« 

o 

4 

C9 

4 

9 

4 

9 

o 

4 

o 

4 

o 

4 

9 

4 

o 

4 

9 

4 

9 

4 

9 

4 

O 

4 

9 

4 

9 

4 

ff' 

cc 

iP 

UN 

« 

fu 

c* 

O 

O 

9 

O 

9 

O 

♦ 

<\ 

<% 

o 

o 

9 

o 

o 

G 

o 

9 

o 

O 

o 

« 

o 

4 

o 

4 

o 

4 

o 

4 

o 

4 

o 

4 

9 

4 

o 

4 

o 

4 

O 

4 

9 

4 

o 

4 

o 

4 

o 

4 

9 

4 

4r 

»P 

o 

<r 

<■ 

<v 

O 

o 

9 

9 

O 

>c 

\n 

41 

o 

o 

o 

o 

o 

9 

9 

9 

O 

o 

<Nl 

• 

o 

4 

o 

4 

9 

4 

o 

4 

o 

4 

9 

4 

CJ 

4 

o 

4 

9 

4 

9 

4 

9 

4 

9 

4 

9 

4 

O 

4 

o 

4 

IN 

•o 

ru 

pi 

PN 

44 

44 

9 

9 

9 

9 

o 

X 

■C 

pi 

IN 

9 

o 

o 

9 

O 

O 

9 

9 

9 

9 

•N 

4 

4 

4 

o 

4 

9 

4 

9 

4 

9 

4 

9 

4 

9 

9 

4 

9 

9 

4 

O 

4 

9 

4 

O 

4 

O 

4 

O' 

UN 

pi 

P* 

X 

fu 

CV 

44 

44 

44 

•O 

J' 

« 

<*4 

44 

o 

O 

o 

9 

9 

O 

9 

O 

O 

X 

4 

4 

o 

4 

9 

4 

o 

4 

9 

4 

o 

4 

9 

4 

o 

4 

9 

4 

O 

4 

O 

4 

9 

4 

9 

4 

o 

4 

o 

4 

o 

♦ 

♦ 

c\» 

U) 

X 

PN 

<M 

cu 

44 

44 

44 

♦ 

X 

PN 

44 

9 

9 

O 

9 

9 

O 

9 

O 

O 

44 

9 

9 

O 

9 

9 

o 

9 

O 

e 

9 

e 

O 

CNt 

IN 

o 

kA 

U) 

X’ 

PO 

A 

PN 

Cg 

44 

44 

44 

44 

rj 

9 

kp 

kP 

9 

<N< 

44 

9 

o 

9 

9 

9 

O 

o 

o 

CM 

9 

9 

o 

9 

9 

9 

9 

O 

O 

o 

o 

o 

«N 

ru  « 

pi 

tn 

«N# 

44 

44 

e 

o 

o 

* 

UN 

P 

CM 

44 

9 

9 

9 

O 

9 

e 

a 

o 

rg 

O' 

4N 

44 

9 

O 

9 

O 

9 

O 

9 

e 

o 

o 

o 

tn 

44 

<N* 

e 

« 

O 

UN 

IP  IP 

Pi 

Ip 

P« 

« 

Ot 

44 

44 

o 

o 

Ok) 

lO 

k0 

u> 

UN  INI 

44 

9 

O 

o 

o 

o 

o 

o 

<N4 

4 

4 

kfl 

4 

ru 

4 

»o 

4 

4 

e 

4 

o 

4 

w 

4 

9 

4 

O 

4 

e 

• 

o 

• 

e 

4 

o 

• 

o 

4 

•O  o  PO  ^ 


-i 

w 

< 

H 


m 

ON 


11 


c. 

cd 

O 


c 

5 

tZ 


c 


< 

z 

c 

hH 

H 

< 

CO 


O 

M 

Cl, 

O 

H 

O 

CO 


< 

z 

O 

hi 

CO 

2 

W 

s 

h-* 

o 

I 

§ 

H 


peS 

O 

Pu 


X 


Dd 


U 

U 

2 

< 

hi 

o  • 

VC  ON 


X  N 

vO  •r-) 
iH  a 


X 

pi 

A 

PN 

9 

P 

o 

P 

n  0- 

41 

fg 

P> 

</ 

uN 

.*N 

rg 

ts. 

fSi 

f-. 

41 

41 

c> 

j'  a. 

4, 

pi 

9 

P 

0 

«’ 

• 

0 

• 

• 

0 

• 

•T 

• 

• 

0 

4* 

• 

Pi  PN 

«  4 

PN 

4 

PN 

4 

PN 

• 

PN 

4 

•o 

• 

PN 

PN 

Cm 

X 

-N 

X 

Cl 

P 

X  O' 

O 

44 

o 

9 

WN 

P 

0 

PN 

p; 

rg 

44  O 

J- 

A 

fl 

A 

U  > 

• 

■9 

0 

• 

• 

0 

0 

• 

0 

• 

'■X 

• 

•  • 

•t 

4 

PN 

4 

4, 

4 

r» 

4 

1) 

4 

PN 

4 

O' 

O' 

PN 

O' 

4N 

D 

0  <N 

o 

O' 

O' 

r* 

A 

P 

pi 

fl 

pi 

Pi 

A 

A 

P 

0 

PN  PN 

CM 

o 

o 

X 

fl 

A 

x 

• 

0 

t 

» 

« 

0 

•r 

a 

0 

• 

0  0 
«  • 

0 

■o 

• 

PN 

PN 

PN 

PN 

A 

0 

O 

P 

A 

rg 

CM  44 

O 

O' 

pi 

P 

CM 

o 

o 

O 

o 

o 

O' 

X 

ac 

pi 

A  n 

0 

Cm 

0m, 

O 

O' 

X 

kP 

4 

kP 

p 

4 

p 

• 

«- 

• 

o 

• 

0 

0 

0 

-u-  «■ 

4  4 

0 

0 

0 

0 

0 

4 

0 

4 

PN 

4 

PN 

4 

pi 

A 

PN 

O' 

PN 

p 

A 

A 

P  o 

X 

U' 

CM 

X 

n 

•o 

PN 

PN 

rg 

Cj 

44 

O 

J- 

X  pi 

w 

PN 

CVt 

o 

O' 

tT' 

« 

• 

g ) 

• 

4  • 

• 

^  ] 

« 

• 

U  I 

• 

■r 

4 

•7  0 

0  0 

4 

0 

0 

4 

a 

4 

0 

0 

PN 

t 

CD 

fi 

O' 

Cm 

PN 

PN 

44 

(^  A 

(M 

X 

PN 

O' 

0 

O 

<£ 

A 

A 

p 

P 

•r 

PN 

C\l 

O  O' 

X 

A 

P 

PN 

CM 

44 

UN 

• 

u. 

• 

P 

• 

p 

• 

P 

• 

P 

UN 

• 

p 

• 

p  ♦ 

•  4 

u* 

•o 

X 

4 

U' 

4 

0 

U' 

4 

9 

A 

9 

CM 

CM 

9 

pi 

P  O' 

PN 

X 

CM 

A 

9 

0 

9 

9 

(^ 

O' 

X 

pi 

A 

0 

p  ^ 

O 

A 

pi 

P 

X 

CM 

A 

• 

• 

P 

• 

p 

• 

P 

• 

P 

• 

P 

• 

p 

• 

p  p 

•  4 

P 

4 

0 

4 

0 

4 

4 

* 

X 

• 

kA 

UN 

o 

p> 

PN 

CM 

X 

X 

X  44 

♦ 

n 

9 

CM 

P 

X 

P) 

PN 

-N 

fM 

4* 

O 

X 

pi 

p 

fM 

o 

0* 

fl 

U’ 

PN 

A 

• 

A 

• 

A 

• 

A 

• 

A 

• 

A 

• 

p 

• 

P 

• 

p  p 

•  • 

P 

4 

P 

4 

4 

♦ 

4 

tf- 

4 

0 

• 

PN 

44 

A 

pi 

A 

PN 

pi 

o 

iM  ♦ 

P 

A 

pi 

X 

O' 

pi 

pi 

U) 

U’ 

PN 

44 

9 

X  A 

♦ 

CM 

9 

X 

A 

P 

• 

A 

• 

A 

f 

A 

» 

A 

4 

A 

• 

A 

• 

A 

4 

Ifl  P 

4  4 

u< 

4 

P 

4 

P 

4 

0 

0 

0 

4 

0 

0 

fV 

O 

PN 

PN 

9 

0 

A 

A 

A  P 

0 

PN 

lO 

PN 

PN 

44 

44 

r> 

O' 

X 

A 

0 

CM 

O  X 

A 

X 

CM 

9 

X 

A 

pi 

• 

pi 

• 

pi 

• 

A 

4 

A 

• 

A 

• 

A 

4 

A 

4 

A  P 

•  4 

P 

4 

p 

4 

P 

• 

P 

4 

0 

4 

X 

4 

0’ 

PN 

o 

0 

0 

PN 

X  P 

P> 

9 

X 

A 

P 

X 

u> 

u> 

0 

PN 

44 

O 

pi 

P 

CM  9 

X 

A 

PN 

44 

O' 

pi 

pi 

• 

pi 

t 

n 

• 

pi 

• 

Pi 

• 

A 

• 

A 

■ 

A 

4 

A  A 

4  4 

p 

4 

P 

4 

P 

• 

p 

4 

0 

4 

X 

4 

pi 

0 

PN 

pi 

pi 

0 

9 

0 

O'  ♦ 

O' 

0 

O 

fl 

P 

PN 

O' 

X 

A 

0 

CM 

9 

pi 

0  CM 

O' 

pi 

P 

CM 

o 

X 

pi 

» 

pi 

4 

n 

4 

fl 

4 

fl 

4 

fl 

4 

pi 

4 

A 

4 

A  A 

4  4 

p 

4 

P 

4 

u> 

• 

U) 

4 

p 

4 

0 

0 

ffk 

0 

PN 

X 

44 

PN 

P  r-  O' 

CM 

A 

44 

fl 

PN 

44 

0 

PN 

CSi 

e 

fl 

P 

CM 

O' 

A  PN 

44 

X 

A 

PN 

44 

O' 

c 

• 

CD 

4 

X 

4 

X 

4 

pi 

4 

Pi 

• 

fl 

• 

A 

4 

A  A 

4  4 

A 

4 

P 

4 

P 

4 

P 

4 

p 

4 

X 

4 

»o 

IP 

0 

A 

A 

0 

CM 

44 

O  44 

CM 

P 

O' 

0 

O 

pi 

O' 

CD 

A 

PN 

o 

pi 

0 

44 

X  P 

cu 

O' 

A 

0 

CM  O' 

flO 

4 

X 

4 

X 

4 

X 

4 

X 

4 

pi 

4 

fl 

4 

fl 

4 

A  A 

4  4 

A 

4 

p 

4 

P 

4 

P 

4 

P 

4 

X 

4 

UN 

9 

X 

44 

P 

O' 

♦ 

44 

O'  X 

O' 

44 

0 

X 

0 

o 

0 

PN  O' 

A 

CM 

X 

P 

CM  X  P 

CM 

e 

fl 

0 

CM 

o 

ffk 

4 

0^ 

4 

X 

4 

X 

4 

X 

• 

fl 

4 

Pi 

4 

fl 

4 

A  A 

4  4 

A 

• 

A 

4 

P 

4 

P 

4 

P  P 

4  4 

O  iP 

CM  *4 

CM 

0 

Ok 

o 

CM  pp 

X 

P 

e 

P 

44 

e  IP 

44 

pi 

m 

O' 

p 

CM  O'  A 

90 

o 

f 

p 

CM 

o 

e 

4 

0^ 

4 

9k 

4 

X 

4 

X 

4 

ri 

4 

n 

4 

^  A  kiD 

4  4  4 

A 

4 

A 

4 

P 

4 

p  p  P 

4  4  4 

161 


APPENDIX  B 

COMPARISONS  OF  2-D  TRANSFORMS 

Several  discrete  unitary  transforms  have  been  used  for  intraframe 
and  interframe  transform  coding  of  images.  These  are  Karhunen-Loeve 
(or  K-L) ,  Fourier,  Cosine,  Sine,  Hadamard,  Haar,  Slant,  etc.  Of  these 
the  K-L  transform  is  the  optimum  transform  for  data  compression  (the 
performance  criterion  is  discussed  in  section  B.l)  and  is  dependent  on  the 
statistics  of  the  data.  The  remaining  transforms  are  data  independent 
and  also  have  FFT  type  fast  computational  algoritams.  For  these  reasons, 
the  others  are  preferred  over  the  K-L  in  practice. 

For  a  class  of  one-dimensional  signals,  i.e.,  first  order  Markov 
process  with  high  correlation,  the  discrete  Cosine  transform  (or  DCT)  is 
known  to  perform  very  close  to  the  K-L  transform  [2,33].  Since  the  2-D 
DCT  is  defined  as  a  separable  product  (i.e.,  Kronecker  product)  of  the 
one-dimensional  DCT,  it  follows  fr  .m  the  above  that  it  will  rerform  very 
close  to  the  2-D  K-L  transform  for  a  2-D  separable  first  order  Marbov 
field  given  by  (A-1),  for  highly  correlated  data  such  as  images.  Although 
the  ieparable  model  of  (A-1)  has  been  used  for  data  compression  [67,68,87], 
for  reasons  discussed  in  Appendix  A,  nonseparable  models  are  preferable 
in  many  cases.  Therefore, the  nonseparable  models  described  by  (A-2) , 

(A-3),  (A-4) ,  (A-5)  and  others  have  been  used  for  data  compression  [49,53, 
67,68,75]  with  better  results  than  the  separable  model.  The  most  commonly 
used  transforms  In  these  studies  are  the  Cosine  and  the  Hadamard. 
rh*’  former  for  its  better  performance  and  the  latter  for  its  simplicity. 

no  theoretical  or  experimental  evidence  exists  for  the  relative 
manct*  of  various  transforms  for  nonseparable  fields. 
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We  have  done  some  evaluations  of  the  performance  of  K-L,  Cosine, 
Sine,  Fourier,  and  Hadamard  transforms  for  a  number  of  commonly  used 
nonseparable  covariances.  Our  reults  show  that  the  Cosine  transform 
performs  extremely  close  to  the  optimum  K-L  transform. 

B.l  Transforms  and  Their  Performance  Measure: 

Let  U  denote  an  MN  1  vector  array  obtained  by  lexicographic 
ordering  of  a  real  M  x  N  2-D  array  U.  We  define  its  transform  by 

V  =  AU  .  (B-1) 

If  the  transform  matrix  A  could  be  written  as 

A  =  A^0A2  (B-2) 

where  A^^  and  A^  are  M  x  M  and  N  x  N  matrices  respectively,  then  it  is 
called  a  separable  transform  and  (B-1)  could  be  written  as 

V  =  A^UA2  (B-2) 

where  V  is  an  M  x  n  array  obtained  by  inverse  lexicographic  ordering  of 
V.  The  K-L  transform,  characterized  by  maximum  mean  square  energy  com¬ 
paction  property,  consists  of  the  eigenvectors  of  the  matrix  ^  defined  by 
(A-9).  Note  that  the  K-L  transform  A^  ^  corresponding  to  an  arbitrary  co- 
variance  matrix  which  is  not  separable,  is  not  separable.  Let 

\  =  E[V^]  1  <  k  <  MN 

and  the  rows  of  A  be  arranged  such  that 


(B-4) 
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Note  chat  the  sequence  consists  of  the  elements  of  matrix  W  defined 

in  Appendix  A. 

We  restrict  our  attention  to  transforms  which  are  unitary 
(all  the  transforms  discussed  above  are  unitary),  i.e.. 


A  ^  =  A*^  . 


For  the  K-L  transform, the  sequence  is  nothing  but  the  eigenvalues  of 

(K  arranged  in  descending  order. 

We  define  the  performance  of  a  transform  by  a  sequence  of  basis 

restriction  errors  (b . :  0  <  1  <  MN}  defined  by 

1  —  — 


MN 

b-  =  I 
1  , 


MN  -  /MN  - 


0  <  1  <  MN-l 


(B-5) 


Each  b^  represents  the  normalized  minimum  mean  square  error  if  only  i  of 
the  transform  coefficients  are  retained.  For  the  K-L  transform  the  se¬ 
quence  {b^}  is  minimum,  i.e.. 


(B-6) 


1  ,  i  , 

k=l  k=l  ^ 


(B-7) 


Thus,  the  K-L  transform  is  optimum  In  the  sense  that  it  minimizes  the 
mean  square  error  when  some  of  the  transform  coefficients  are  discarded. 
Since  represents  the  mean  square  energy  of  the  transform  coefficients, 

the  property  (B-7)  is  called  maximum  mean  square  energy  compaction  pro¬ 


perty  of  the  K-L  transform. 
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Another  performance  measure  used  in  data  compresion  applications 
is  the  distortion-rate  function  which  is  defined  below.  Let  each  trans¬ 
form  coefficient  be  independently  quantized  to  a  finite  number  of  levels 
and  d^  be  the  mean  square  error  per  unit  variance  due  to  the  quantization 
of  the  ith  coefficient  of  array  V,  then 

MN 

D  =  y  afd.  (B-8) 

1=1  ^  ^ 

gives  the  total  mean  square  distortion  in  a  transform  coding  system  with 
a  noiseless  channel.  Let  n^  be  the  number  of  bits  required  to  code  the 
output  of  the  ith  quantizer.  Then  the  rate  is  given  by 

MN 

Rp  =  ^  n^  bits/sample,  n.  =  integer  ^  0  (B-9) 

i=l 

where  the  sequence  {n^}  is  chosen  such  that  D  in  (B-8)  is  minimized  for  a 
fixed  Rp.  The  D  vs  R^  curves  obtained  from  (B-8)  and  (B-9)  are  the 
distortion-rate  functions  for  an  integer  bit  allocation  scheme. 

B.2  Experimental  Results: 

We  compare  the  performance  of  the  K-L,  Cosine,  Sine,  Fourier,  and 
Hadamard  discrete  transforms  which  are  often  considered  for  data  com¬ 
pression.  The  definition  and  properties  of  these  and  some  other  transforms 
could  be  found  in  [3,31,58],  We  have  chosen  two  block  sizes  which  are 
of  interest  in  data  compression,  i.e.,  8x8  and  16  x  16.  For  some  of  the 
comparisons  the  complexity  of  computing  the  eigenvalues  of  matrix 
prohibits  sizes  larger  than  8  x  g. 

All  the  above  transforms  other  than  Fourier  are  real, and  result  in 
MN  real  nonredundant  transform  coefficients  for  an  array  size  of  M  x  N. 


» 
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I 


I 


I 


t 


f 


Since  the  data  is  assumed  to  be  real,  of  the  MN  complex  Fourier  coeffi¬ 
cients,  only  (MN/2  +  2)  are  real  and  (MN/2  -  2)  Imaginary  components  are 
nonredundant  (due  to  symmetry).  It  is  therefore  sufficient  to  consider 
the  variances  of  these  components  in  obtaining  the  sequence  {a^} . 

Figures  B-1  through  B-3  show  the  basis  restriction  errors  for 
the  above  mentioned  transforms  for  M  =  8,  N  =  8  and  the  three  nonseparable 
random  fields  described  by  (A-2),  (A-3),  (A-4)  and  (A-5)  in  Appendix  A. 

For  the  isotropic  field  of  (A-2),  the  values  of  =  .95  were  chosen 

[53].  For  the  noncausal  NCI  model  of  (A-3)  and  (A-4)  the  values  of  the 
parameters  were  chosen  to  be 


a  =  .2496,  3^  =  .0744.  y  =  .95. 


These  values  were  found  by  a  least  squares  fit  of  the  model  and  the 
16  X  16  measured  covariances  for  the  Girl  image  shown  in  Figure  6-10 (a) 
[34,35].  For  the  measured  covariance  model  of  (A-5),  the  same  Girl  image 
data  was  used. 

Figure  B-4  shows  the  distortion-rate  curves  for  the  isotropic 
model  assuming  a  Gaussian  distribution  for  the  transform  coefficients. 
The  distortion  has  been  calculated  based  on  optimum  mean  square  quanti¬ 
zation  [47]  and  optimum  integer  bit  allocation  (l.e.,  via  integer  pro¬ 
gramming  algorithm)  of  [86].  Figures  B-5  and  B-6  show  the  distortion- 
rate  curves  for  the  Isotropic  model  of  (A-2)  and  the  separable  model  of 
(A-1)  respectively,  for  p^  =  p^  =  .95,  M  »  N  *  16.  For  this  array  size, 
the  K-L  transfomn  was  excluded  due  to  computational  difficulties.  Table 
B-1  gives  the  results  of  Figures  B-4  and  B-5  in  numerical  form. 

From  Figures  B-1  thru  B-4  we  note  that  the  performance  of  the 
Cosine  transform  is  very  close  to  the  optimum  K-L  transform  for  all  the 
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three  nonseparable  models.  While  the  performance  of  the  remaining  three 
is  not  too  close  to  the  K-L  but  quite  close  to  each  other.  The  perform¬ 
ance  of  the  Cosine  transform  is  about  .05  dB  inferior  to  the  K-L  while 
that  of  others  is  about  1.0  dB.  At  a  bit  rate  of  1  bit/pixel  the  Cosine 
would  require  a  rate  increase  of  approximately  1%  to  match  the  performance 
of  the  K-L  while  the  others  would  require  about  25%  increase.  From 
Figures  B-5  and  B-6  we  note  that  the  relative  performance  does  not  change 
much  for  a  slightly  larger  array  size  as  well  as  for  considerable  dif¬ 
ferent  models. 

Since  the  Cosine  transform  can  be  implemented  by  a  fast  algorithm 
[2,15]  and  is  data  Independent,  its  computational  advantages  over  the  K-L 
overwhelm  the  marginal  difference  in  performance.  The  performance  dif¬ 
ferences  between  the  Cosine,  Sine,  Fourier,  and  the  K-L  will  decrease 
further  as  the  array  size  is  increased.  Since  all  these  sinusoidal  trans¬ 
forms  are  asymptotically  equivalent  [33]. 

Thus  the  prime  advantage  of  Cosine  transform  coding  remains 
in  the  common  situation  where  a  larger  image  is  coded  block  by  block  with 
typical  block  size  of  16  x  16  or  8  x  8.  Finally,  we  note  that  the  re¬ 
cursive  block-coding  of  random  fields  via  fast  K-L  transform  algorithms 
[36]  achieve  rates  close  to  and  better  (!)  than  conventional  K-L  trans¬ 
form  block-coding  method,  by  coding  the  boundary  variables  of  a  block 
separately  and  exploiting  the  interblock  redundancy  represented  by  the 
boundary  variables.  Comparison  with  these  algorithm  is  not  made  here 
and  is  left  as  a  future  study. 
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